Voice communication, the most natural form of interaction for humans, is the new frontier in user experience. People can now interact with machines using natural language and digital assistants. The integration of voice user interfaces (VUI) into applications is a growing UX trend for 2022 and beyond, with more people now owning voice-controlled speakers than ever. So, why are voice user interfaces the future?
What is a voice user interface?
A voice user interface (VUI) is a speech recognition technology that allows users to interact with computer systems through voice commands. Prime examples include Google’s Assistant, Amazon’s Alexa, and Apple’s Siri. Human–computer interaction has come a long way, from 1950 when Alan Turing introduced his test to determine if a machine was able to conduct a conversation that would closely resemble the human way of talking. We currently have VUIs capable of holding conversations like human beings.
There are many classes of user interfaces, including text-based natural language interfaces such as chatbots. Chatbots are specific VUIs that allow humans to interact with computer systems through written text or spoken commands. There are two main types:
- Block chatbot: this uses databases and ready-made scenario responses to trigger actions with text commands, e.g., after entering the word “contact,” contact options or navigation buttons will be displayed and placed directly in the bot.
- Conversational chatbot: this uses natural language processing and machine learning to handle conversations with users by learning from the questions asked and recognizing context statements.
A brief history of voice user interfaces
The history of voice user interfaces goes back to 1952, when engineers at Bell Labs developed an automatic digital recognizer called Audrey. But due to limited technology, the system could only recognize the numbers 0–9. After that, the first generation of interactive voice response systems was introduced in 1984 by Speechworks and Nuance. These systems could recognize natural language over a phone call and perform simple tasks.
However, it wasn’t until the 2000s that voice user interfaces became mainstream. In 2001, Ivona Software developed award-winning text-to-speech technology for use with user interfaces on communication systems and services, computers, and mobile devices. Amazon later acquired it and from 2013, it has been developed at the Amazon Development Center, Poland.
Apple introduced Siri to enable voice interaction between humans and machines. Apple Siri was officially launched in 2011. Google Now, a voice-activated personal assistant, was introduced in 2012, followed by Amazon’s Alexa in 2013. The use of voice user interfaces has grown with other industry players such as Samsung, Xiaomi, and Huawei introducing Bixby in 2017, Xiaomi Ai in 2018, and Huawei Celia in 2020, respectively.
Different ways voice user interfaces are used
Voice user interfaces are used in several ways, such as voice commands, text-to-voice technology (reading texts for visually impaired people), speech-to-text technology, and voice bots/conversational interfaces. You can find VUIs in various applications, including:
- smart TVs
- IoT devices
- smart displays
- mobile phones
How voice commands are changing the way we search for information
The advent and increasing use of voice user interfaces has changed how we interact with technology. Voice interactions allow users to verbally search for information in a natural language, using a smart device rather than typing a text query into a search bar. This query is then answered using a digital assistant. Similarly, when people use apps on their mobile devices, they can access content without needing to learn the content structure of the app or its visual UI.
A report by Juniper Research indicates eight billion digital voice assistants will be in use by 2023. The same firm predicts a significant increase in voice-based ad revenue, which is expected to be facilitated by the development of voice search applications, especially on mobile devices.
While organic search will remain the best way for brands to reach consumers, the popularity of voice search is bound to grow. As a result, advertising marketers and agencies will put pressure on Amazon and Google, renowned industry giants in designing voice user interfaces, to open their platforms to new and better forms of paid messages that correspond to the current paid search results.
Benefits of voice user interfaces
The main benefit of a voice user interface is a customized user experience. You can personalize voice user interfaces to suit your needs. Unlike graphical user interfaces, which aren’t easy to customize, you can train a voice-based user interface to recognize your voice. If you are using Amazon Alexa, you can create a voice profile and simply say “learn my voice” to create separate voice profiles. This way, Alexa can detect who is speaking and offer a personalized UX.
Voice assistants also offer customized user experiences through voice shopping. Users can conveniently search and purchase items anytime and anywhere while doing other things. Voice assistants through AI can use stored preferences and previous search histories to produce customized search results with higher relevance.
The impact of the COVID-19 pandemic on voice user interface development
The COVID-19 pandemic transformed our lifestyles, leading to a change in how we use mobile devices and voice interfaces. It’s a crisis that has fostered innovation and from 2020, the move from touch to speech recognition through AI-based virtual assistants and chatbots has played a role in fighting the COVID-19 pandemic. People who could not leave their houses due to restrictions could still access healthcare services through voice interaction.
While chatbots were effective in monitoring the health state of patients, VUI applications such as Apple’s Siri allowed users to check if they had Covid-19 symptoms. For instance, if a user asked Siri a question such as “I think I have COVID-19,” this tool would start a new conversation tree to determine the current symptoms of the user. Note that the user was required to respond to each of these voice and text prompts with “no,” “yes,” or “not sure.” If the user had the symptoms, Siri would recommend telemedicine apps.
With an accessible and user-friendly interface, voice interface AI chatbots are providing healthcare facilities with a cleaner and safer method of interaction. Even in the post-pandemic world, people still expect voice AI solutions to meet their needs since they are efficient and comfortable.
The use of touch screens, especially in public spaces, has become troublesome of late. Unlike in the pre-pandemic period when most people didn’t give a second thought to touch screens, more people now prefer contactless interactions. This shows that voice user interfaces are becoming vital, just like graphical user interfaces. The result is that voice interactions will become even more helpful in healthcare, where contactless operations are usually encouraged to avoid the rapid spread of infections.
Off-the-shelf solutions already available on the market
Voice assistants have enhanced interaction between humans and machines. With voice interfaces and interactions becoming increasingly common, there are several off-the-shelf solutions already available on the market, including:
#1 Amazon Transcribe
Amazon Transcribe, an automatic speech recognition service, uses machine learning models to convert speech to text. A transcription request will prompt this service to produce a transcript with data, including timestamps and confidence scores for each word and punctuation mark. It’s used in natural language processing to convert voice commands into text.
#2 Amazon Transcribe Medical
This is an automatic speech recognition service specifically made for medical use. Amazon Transcribe Medical can effectively create accurate transcriptions from medical consultations between physicians and patients. This includes medical procedures, names, and diseases. It’s available in the form of public APIs that can be used to address both real-time voice-to-text applications and batch workloads.
#2 Cloud Speech-to-Text by Google
Cloud Speech-to-Text by Google converts audio to text by using powerful neural network models. Users can transcribe content from stored files or in real time. It’s available in over 125 languages and variants, supporting a global user base. This API has improved customer service by implementing voice commands and accurately transcribing multimedia content. It offers a high level of accuracy for voice-to-text transcriptions.
#3 Google Conversational Actions
Developed for Google Assistant, Google Conversational Actions are apps designed for smart speakers. They work with any Google Assistant device and through them, third parties can design their own uses. It’s one of the best services to use if you want to customize speech recognition services. They are similar to Amazon Alexa Skills and can function as an app or chatbot where a third party can use it to power the Action in their customized language.
#4 Google Dialogflow
This is a platform that understands natural conversation, making it easier to design and integrate a conversational type of user interface into web apps, mobile apps, bots, voice command devices, and others. The interactive platform is ideal for businesses. It can analyze several types of customer inputs, including audio and text. Consequentially, it can respond to them through synthetic speech or text.
VUIs will be more and more popular
The use of applications for voice user interfaces is about to explode. Corporate giants such as Apple and Amazon have brought virtual assistants such as Siri and Alexa into the mainstream, with other companies following the lead. According to MarTech, roughly 1 in 4 adults in the US has at least one smart speaker, and as of 2020, there were around 157 million such devices in American homes. Meanwhile, by the third quarter of 2020, global sales of smart displays had grown by 21%, with sales hitting 9.5 million units, while sales of basic smart speakers declined by three percent. This gives insights into changes in user preferences.
It’s important to note that smart display technology is one of the driving factors behind the change in user experience. With a smart display, you can now make calls, turn on/off devices, or search for information by asking the assistant to do so. This hands-free communication has kickstarted the era of smart homes, with more people trying to integrate voice-based devices into their houses. For instance, Amazon’s smart screen Echo Show is made with face-tracking technology. This feature allows the camera to follow you around the room while making video calls.
It is expected that over 90% of smartphone users will be using voice assistants by 2023. Meanwhile, modern car models will have Android Auto/Car Play since everyone has access to smartphones, laptops or tablets. A report by the UK’s Transport Research Laboratory indicates that the level of driver distraction is much lower when using voice-activated systems than touch screen technologies. So, such systems can help enhance road safety.
Voice user interfaces are currently in use in our daily lives more than we think. Digital voice assistants are found in many places, with Google Assistant, Siri, and Alexa controlling over 3 billion devices. By 2023, almost every app will be using a voice user interface in some way. The speech and voice user interface sector is expected to reach $24.9 billion by 2025.
Voice user interfaces have enhanced how people interact with computing systems. They provide a hands- and eyes-free way of accessing information from smart devices. These systems can be added to home automation systems, automobiles, appliances, and computer operating systems.
Voice UI is changing the way we search for information. You can now issue a voice command and a voice assistant will answer your query. The main advantage of such systems is personalizing the user experience. You can customize a voice interface to suit your specific needs. It’s possible to create multiple voice profiles for various users with different accounts, enhancing individual accessibility.
The use of voice assistants and voice search is growing significantly, with such services becoming popular with all age groups. Such systems work with less supervision and normally make better choices on the user’s behalf. Voice UI has emerged as a useful technology, especially during the Covid-19 pandemic. It has helped enhance access to healthcare services by customizing search information and reducing the necessity to touch surfaces. From all these considerations, it’s evident that VUI will eventually take over our daily lives, with trends indicating that the voice interface market will hit $24.9 billion by 2025.