We Hear You Loud And Clear - Speech Recognition

The Road to Here

In our fast-paced society we consistently push the limits of technology and human computer interaction. The pace only continues to quicken in the mad rush of innovation. Today it is likely safe to assume that your company’s employees and customers expect the same.

First came the days when you needed a website to be current. It wasn’t long before static websites moved to dynamic content and web apps started to mature. Then we gradually transitioned to the Golden Age of the App. If you had an app your IT staff could check that block.

With the app ecosystem starting to become saturated you need more innovation and personalization to differentiate yourself and to give the ease of use the is demanded out of top-notch apps. Enter Speech Recognition.

Speech recognition’s future goes back quite a way too.

Hollywood has used Speech Recognition to thrill and excited us with memorable scenes including

IronMan – Jarvis
2001: A Space Odyssey – HAL 2000
Star Trek: The Next Generation – The ship’s computer

Speech Recognition has actually been around for quite some time, but it was quite limited in scope. The proliferation of mobile phones and the maturation of Speech Recognition software and neural networks has made this a completely different ball game now. There is speculation that 2017 is the year of Voice Recognition. The error rate has dropped from 43% in 1995 to only 6.3% this year and is now on par with humans.

Source: Benchmarks: Comparison of different architectures on TIMIT and large vocabulary tasks

Voice Search: Usage Increasing Quickly

Ways to Interact With Voice

There are a handful of different ways that you can utilize voice interactions to build your user experience. Which methods you choose are largely dependant on your existing assets and infrastructure, and what you want to accomplish.

Voice Assistants: Siri, Google Now, Cortana

Siri / Google Now Integration
Users are familiar with this method of interaction
Some limitations exist

Alexa / Google Home

Rapid increase in sales of voice recognition hardware
Requires voice-only interactions
Custom In-App Voice
Engage your app users while they are using the app
Must handle Natural Language Processing yourself

Web-Based Voice Recognition

Enable voice commands for repetitive tasks

Voice Assistants: Siri, Google Now, Cortana

The Voice Assistants of yesteryear have grown up and have added a late addition to the party. They provide some cool and genuinely useful tools and integrations – but their use doesn’t stop there. Siri and Google’s assistants have opened up their platforms a bit, and Cortana is getting ready to. There are a lot of good options to integrate with these assistants

Siri

SiriKit enables your iOS 10 apps to work with Siri, so users can get things done with your content and services using just their voice. Currently they only offer interactions with the following “intents” or capabilities:

VoIP Calling
Messaging
Payments
Photos
Workouts
Ride Booking
CarPlay
Restaurant Reservations

You can find more information out at Apple’s SiriKit Programming Guide.

A pretty safe bet is that Apple is in the process of opening up custom actions, largely in response to market demands.

OK Google / Google Now / Google Assistant

Google Voice Actions come in two flavors:

System Actions include the following intents that you can integrate with:

Alarm
Communication
Fitness
Local
Media
Open
Productivity
Search

There are a lot of things that Google Voice Actions already recognize. This website is a great way to discover what’s possible.

You can define Custom Actions to support additional use cases.

Currently, custom actions are only available on GoogleHome and Pixel. Other devices will follow soon.

Cortana

From basic mobile deep links to full integration of your bots and services, the skills kit provides all the tools and docs you need to promote your services and engage users through the Cortana experience.
Once created, your skill works wherever your code runs. By registering your bots, services, mobile apps, and websites as Cortana skills, over 145 million active monthly users will be connected to these capabilities.
People can interact with your skills in various ways. Cortana can offer a skill based on a natural language request during a conversation, or proactively present a skill based on a user’s preferences and context.

Look for the Cortana Skills Kit preview in early 2017.

The Cortana Skills Kit will allow developers to:

Leverage bots created with the Microsoft Bot Framework and publish them to Cortana as a new skill
Integrate their web services as skills and re-purpose code from their existing Alexa skills to create Cortana skills
Connect users to skills when users ask, and proactively present skills to users in the appropriate context
Personalize their users’ experiences by leveraging Cortana’s understanding of users’ preferences and context, based on user permissions

Cortana has apps on both iOS and Android

Alexa / Google Home

The New Kids on the Block

Google Home and Amazon Echo (Alexa) are one more outlet to digitally interact with your customers. Furthermore, it is an extension to your digital brand outside of the app, still enhancing and simplifying your customer’s lives while connecting with them through digital means.

The Echo and Home are more than just speakers – they are built to help users at home, the location where the shopping experience begins. Both Alexa and Home can integrate with backend services allowing you to extend your brand. Although the market is still young, integrating with these devices can prove to be very beneficial.

Pros

Users are already familiar with voice control
They are invested in the platform
Development platform capabilities are strong

Cons

Voice only interaction, called Voice User Interface (VUI)

Alexa Voice Services (Amazon Echo)

Offer the most robust development tools
Strongly positioned in the market
Shipped 5MM units, expects to double this in 2017
Best external voice controlled device currently

Alexa Voice Services: Under the hood

User Flow
Alexa Skills Kit Architecture
Alexa Skills

Google Home

Google Home is a Wi-Fi speaker that also works as a smarthome control center and an assistant for the whole family. You can use it to playback entertainment throughout your entire house, effortlessly manage every-day tasks, and ask Google what you want to know.

In-App Speech Recognition

Bring Your Own Voice (BYOV)
There are a variety of voice interaction points between the user and the app. Triggering voice interactions from within the app offer a unique method to engage your users

Pros

Enhanced capabilities, less limitations
Continue the voice conversation inside of the app

Cons

Rolling your own solution takes expertise in several areas. If you are want smart features that resemble a voice assistant you will have to figure out how to handle
Voice recognition
Understanding intent
Triggering responses
Voice replies

iOS
Here is Apple’s library to enable Speech Recognition

Android
Here is Android’s library to enable Speech Recognition

Web-Based Voice Recognition

Circling back around to where we began – we can’t leave web based voice recognition out of the equation. If you are using Chrome or Firefox you have noticed that this page supports Speech Recognition. This capability comes from the Web Speech API. Of particular note it also handles Speech Synthesis.

This has been possible for several years now but it hasn’t been put to much good use. Web-based voice recognition shares a lot of similarity with in-app voice recognition in that you have to handle everything yourself.

Voice User Interface (VUI)

A corpus of research has shown that people infer personality traits from even the briefest voice interactions. Voice is a form of Human Computer Interaction (HCI) that does exactly what the name infers: Humanizes the interactions. Because of this it is important that you take special consideration of how you communicate with the user.

Although much good advice for Graphical User Interfaces (GUIs) may apply, don’t try to simply convert your GUI into a VUI. There’s a lot more to think about.

Here are some tips for conversations, from Google about Google Assistant: (Video)

Create a persona: The “face” of the company.

Leverage your brand.
List brand core attributes that can be conveyed in voice
Bio-sketch of this user, perhaps give it a name
Serves as a grounding mechanism to fall back on for consistency
Define yourself as separate from the Google Assistant
Greet the user

Think outside the box

Don’t start with code
Write out core experiences like you would a screenplay
Keep it simple

Context matters

Where is the user?
Where are they?
What are they doing?
What type of device are they acting on?
How is the experience influenced over time?
Cater to the user’s intent, not a feature

In Conversation there are no Errors

There are limitations, but recognize them for what they are
Take voice input “errors” and make them into a meaningful conversation
Look at the interaction from the user’s perspective

Think bigger

Starting simple is good but…
Don’t limit yourself here.
Help somebody gain access to information that they didn’t have before

Communication is Key

If you can communicate well, you will engage and even entertain. But it’s not clear sailing from here on out because dealing with voice interactions a lot is going on.

Voice Activation
Speech Recognition & Transcription
Intent and Meaning
Data Search & Query
Speech Response

Additional References for Voice Design

Voice Design

Case Studies

Capital One’s case study about they implemented Alexa Voice Services
Domino’s launches a voice controlled pizza ordering sidekick

Sign up for the Shockoe newsletter and we’ll keep you updated with the latest blogs, podcasts, and events focused on emerging mobile trends.

Tagged blog