The Road to Here
In our fast-paced society we consistently push the limits of technology and human computer interaction. The pace only continues to quicken in the mad rush of innovation. Today it is likely safe to assume that your company’s employees and customers expect the same.
First came the days when you needed a website to be current. It wasn’t long before static websites moved to dynamic content and web apps started to mature. Then we gradually transitioned to the Golden Age of the App. If you had an app your IT staff could check that block.
With the app ecosystem starting to become saturated you need more innovation and personalization to differentiate yourself and to give the ease of use the is demanded out of top-notch apps. Enter Speech Recognition.
Speech recognition’s future goes back quite a way too.

Hollywood has used Speech Recognition to thrill and excited us with memorable scenes including
- IronMan – Jarvis
- 2001: A Space Odyssey – HAL 2000
- Star Trek: The Next Generation – The ship’s computer
Speech Recognition has actually been around for quite some time, but it was quite limited in scope. The proliferation of mobile phones and the maturation of Speech Recognition software and neural networks has made this a completely different ball game now. There is speculation that 2017 is the year of Voice Recognition. The error rate has dropped from 43% in 1995 to only 6.3% this year and is now on par with humans.

Source: Benchmarks: Comparison of different architectures on TIMIT and large vocabulary tasks
Voice Search: Usage Increasing Quickly
Ways to Interact With Voice
There are a handful of different ways that you can utilize voice interactions to build your user experience. Which methods you choose are largely dependant on your existing assets and infrastructure, and what you want to accomplish.
Voice Assistants: Siri, Google Now, Cortana
- Siri / Google Now Integration
- Users are familiar with this method of interaction
- Some limitations exist
Alexa / Google Home
- Rapid increase in sales of voice recognition hardware
- Requires voice-only interactions
- Custom In-App Voice
- Engage your app users while they are using the app
- Must handle Natural Language Processing yourself
Web-Based Voice Recognition
- Enable voice commands for repetitive tasks
Voice Assistants: Siri, Google Now, Cortana
The Voice Assistants of yesteryear have grown up and have added a late addition to the party. They provide some cool and genuinely useful tools and integrations – but their use doesn’t stop there. Siri and Google’s assistants have opened up their platforms a bit, and Cortana is getting ready to. There are a lot of good options to integrate with these assistants
Siri
SiriKit enables your iOS 10 apps to work with Siri, so users can get things done with your content and services using just their voice. Currently they only offer interactions with the following “intents” or capabilities:
- VoIP Calling
- Messaging
- Payments
- Photos
- Workouts
- Ride Booking
- CarPlay
- Restaurant Reservations
You can find more information out at Apple’s SiriKit Programming Guide.
A pretty safe bet is that Apple is in the process of opening up custom actions, largely in response to market demands.
OK Google / Google Now / Google Assistant
Google Voice Actions come in two flavors:
System Actions include the following intents that you can integrate with:
- Alarm
- Communication
- Fitness
- Local
- Media
- Open
- Productivity
- Search
There are a lot of things that Google Voice Actions already recognize. This website is a great way to discover what’s possible.
You can define Custom Actions to support additional use cases.
Currently, custom actions are only available on GoogleHome and Pixel. Other devices will follow soon.
Cortana
From basic mobile deep links to full integration of your bots and services, the skills kit provides all the tools and docs you need to promote your services and engage users through the Cortana experience.
Once created, your skill works wherever your code runs. By registering your bots, services, mobile apps, and websites as Cortana skills, over 145 million active monthly users will be connected to these capabilities.
People can interact with your skills in various ways. Cortana can offer a skill based on a natural language request during a conversation, or proactively present a skill based on a user’s preferences and context.
Look for the Cortana Skills Kit preview in early 2017.
The Cortana Skills Kit will allow developers to:
- Leverage bots created with the Microsoft Bot Framework and publish them to Cortana as a new skill
- Integrate their web services as skills and re-purpose code from their existing Alexa skills to create Cortana skills
- Connect users to skills when users ask, and proactively present skills to users in the appropriate context
- Personalize their users’ experiences by leveraging Cortana’s understanding of users’ preferences and context, based on user permissions
Cortana has apps on both iOS and Android
Alexa / Google Home
The New Kids on the Block
Google Home and Amazon Echo (Alexa) are one more outlet to digitally interact with your customers. Furthermore, it is an extension to your digital brand outside of the app, still enhancing and simplifying your customer’s lives while connecting with them through digital means.
The Echo and Home are more than just speakers – they are built to help users at home, the location where the shopping experience begins. Both Alexa and Home can integrate with backend services allowing you to extend your brand. Although the market is still young, integrating with these devices can prove to be very beneficial.
Pros
- Users are already familiar with voice control
- They are invested in the platform
- Development platform capabilities are strong
Cons
- Voice only interaction, called Voice User Interface (VUI)
Alexa Voice Services (Amazon Echo)
- Offer the most robust development tools
- Strongly positioned in the market
- Shipped 5MM units, expects to double this in 2017
- Best external voice controlled device currently
Alexa Voice Services: Under the hood
- User Flow
- Alexa Skills Kit Architecture
- Alexa Skills
Google Home
Google Home is a Wi-Fi speaker that also works as a smarthome control center and an assistant for the whole family. You can use it to playback entertainment throughout your entire house, effortlessly manage every-day tasks, and ask Google what you want to know.
In-App Speech Recognition
Bring Your Own Voice (BYOV)
There are a variety of voice interaction points between the user and the app. Triggering voice interactions from within the app offer a unique method to engage your users
Pros
- Enhanced capabilities, less limitations
- Continue the voice conversation inside of the app
Cons
- Rolling your own solution takes expertise in several areas. If you are want smart features that resemble a voice assistant you will have to figure out how to handle
- Voice recognition
- Understanding intent
- Triggering responses
- Voice replies
iOS
Here is Apple’s library to enable Speech Recognition
Android
Here is Android’s library to enable Speech Recognition
Web-Based Voice Recognition
Circling back around to where we began – we can’t leave web based voice recognition out of the equation. If you are using Chrome or Firefox you have noticed that this page supports Speech Recognition. This capability comes from the Web Speech API. Of particular note it also handles Speech Synthesis.
This has been possible for several years now but it hasn’t been put to much good use. Web-based voice recognition shares a lot of similarity with in-app voice recognition in that you have to handle everything yourself.
Voice User Interface (VUI)
A corpus of research has shown that people infer personality traits from even the briefest voice interactions. Voice is a form of Human Computer Interaction (HCI) that does exactly what the name infers: Humanizes the interactions. Because of this it is important that you take special consideration of how you communicate with the user.
Although much good advice for Graphical User Interfaces (GUIs) may apply, don’t try to simply convert your GUI into a VUI. There’s a lot more to think about.
Here are some tips for conversations, from Google about Google Assistant: (Video)
Create a persona: The “face” of the company.
- Leverage your brand.
- List brand core attributes that can be conveyed in voice
- Bio-sketch of this user, perhaps give it a name
- Serves as a grounding mechanism to fall back on for consistency
- Define yourself as separate from the Google Assistant
- Greet the user
Think outside the box
- Don’t start with code
- Write out core experiences like you would a screenplay
- Keep it simple
Context matters
- Where is the user?
- Where are they?
- What are they doing?
- What type of device are they acting on?
- How is the experience influenced over time?
- Cater to the user’s intent, not a feature
In Conversation there are no Errors
- There are limitations, but recognize them for what they are
- Take voice input “errors” and make them into a meaningful conversation
- Look at the interaction from the user’s perspective
Think bigger
- Starting simple is good but…
- Don’t limit yourself here.
- Help somebody gain access to information that they didn’t have before
Communication is Key
If you can communicate well, you will engage and even entertain. But it’s not clear sailing from here on out because dealing with voice interactions a lot is going on.
- Voice Activation
- Speech Recognition & Transcription
- Intent and Meaning
- Data Search & Query
- Speech Response
Additional References for Voice Design
Voice Design
- Voice Design Best Practices
- Defining the Voice Interface
- Developing Alexa Skills – Sessions and Voice User Interfaces
Case Studies
- Capital One’s case study about they implemented Alexa Voice Services
- Domino’s launches a voice controlled pizza ordering sidekick
Sign up for the Shockoe newsletter and we’ll keep you updated with the latest blogs, podcasts, and events focused on emerging mobile trends.