How To Design Great Skills For Voice Assistants in 2019

Get into the right mindset to start designing your next skill by reviewing a few key elements of voice interfaces and user experience.

Voice as an interface can allow a company to define its character and have different moods and dynamics according to the context and the user its interacting with. Forget customer segments, we’re talking about local treasure troves of data sitting at the edge of the web, allowing the version of you living with a customer to behave differently from the one across the street. Now that is a unique “customer relationship”.

Voice is possibly the first tech wave that’s about nailing the culture. It’s about how people speak, not you.

Users don’t need the latest firmware, update and so on. Speech is about understanding people, not talking to them. The interactions are more spontaneous, the cache has a shorter lifespan… Think of younger generations and their use of TikTok and Snapchat. Excited? Cool. Let’s start digging.

What’s Worth Doing Right Now?

A report from CapTech Consulting found that music still dominates use cases for smart speakers (reported by 82% of owners), with the second most popular use being inquiries and information gathering (42%).

Some of these inquiries directly speak to a relevant Skill or Action (for Google Home), but in order for them to work, the phrasing needs to be very specific. This currently causes quite a bit of frustration to some owners, including the infamous toddler who painfully tried for five minutes to make Alexa play her favorite song.

As to Google’s virtual assistant, it’s said to be a lot more powerful and accurate than Siri, Cortana or Alexa, thanks to its search algorithm, access to content and natural language processing (NLP) technology.

While the idea of smart speakers and smart homes may appeal to young working adults, voice commands in general can give access to a whole new world of content to a much broader audience. From toddlers to seniors who may not be so comfortable with touch screens, voice literally speaks to different personas. These people don’t normally interact with brands digitally, which creates an opportunity to expand your knowledge of their wants and needs.

Optimize your content for voice

In the broader voice search category (including searches from mobile), Google claims the average result has a length of 29 words.

That’s not to say web pages should only contain a handful of words; the average search result page is actually 2,312 words long.

What this means is Google can draw from long-form content, provided it’s written in an accessible way. Structured data markup from schema.org is one way to optimize your content.

Listen First, Then Listen Some More

We don’t know yet what customers will find more appealing, but if they can find information quickly and spend less time searching, it’s already a win. A few simple questions to assess whether a voice project is worth doing:

· Does it save people time?

· Is it more convenient?

· Does it allow users to interact with you on their terms (not yours)?

In a transparent and interactive model, information is power but power surges back and forth .If customer service means explaining to people why things are the way they are rather than listening to what they have to say, that’s a no go. Make it an open feedback channel so that people can help you serve them better.

Also, bear in mind that the expectation of someone who just bought a device in 2019 will be very different from the years before. The space is evolving quickly. The longer you wait, the higher the bar is likely going to be.

Where To Start

Ideally, break your project down and do a progressive rollout. If you have a decent API and cloud infrastructure, getting a first skill up and running could take as little as 9 to 12 months. Investments in APIs and cloud help lower the cost of developing voice assistants. Serverless computing, more broadly, makes a ton of sense.

Either way, you will need a voice strategy and a scope definition, then you’ll be ready to test out an SMS “proof of concept” and finally, your voice prototype.

With apps, you need to open the app and login. Meanwhile, SMS is more direct and simplified. Even if something requires a couple of clicks, you will lose some people along the way. Not to mention the confusion every time the interface gets a facelift.

1. Start with intelligent alerts by SMS with simple responses (“Your home alarm went off. Was it you? Type Confirm or Deny”). At this stage, basic SMS workflows can even be done using one of many cloud-based services for a quick and dirty proof of concept.

2. Next, try to interpret all responses (“OMG yes, that was me, thanks! 😊”) Interpret emojis to define intent. You can try interpreting photos too. At this stage, the SMS conversation may still lead to a link to a webpage, if the information requested is too complex.

3. Then comes voice. This is a critical advantage of conversational AI; you don’t need to think about the best way to structure your navigation. Instead, you get to focus on the best way to answer the customer’s question and identify all the ways in which a customer may ask a question.

No chatbot, you ask? Chatbots tend to be disappointing, in part because, as we said earlier, when given a keyboard, people tend to phrase much more complex requests, which make finding an appropriate response a lot more difficult.

A Few Things To Keep In Mind

Uber could not have been made possible without mobile tech and reliable high-speed Internet. But at the moment, what we have are more robust and interactive household radios; we have yet to see the thing that couldn’t have existed without voice.

How can you build an experience that leverages all the strengths of the different ways that you can interact with your customers? For now, ask yourself:

· Is there something people would use more often if it was available?

· Is there something they would do every day? In what context?

· What would people like to get done or figure out without too much thinking?

Be Human, But Not Too Much

Character is consistent while mood is changing. Conversations are less formal and its design requires getting nuances and inferences right. It’s human-driven, which creates an expectation of “not being a stranger”. Figuring out the right level of persistence will be crucial to delight users. Don’t ask me for their name every time they ask you something. Understand sequences such as “Is it going to rain today? What about tomorrow?” Remember choices and suggest more based on preferences.

Basically, be intimate, but don’t sound too human, that’s just creepy. In terms of speech synthesis, remind users it’s not a person, yet also not sounding like an awful robot.

Modality Matters

Conversation will be critical within the next 10 years because it allows to listen to customers the way they actually speak. Some things are well suited for voice, others not, such as lists. Right now, not so easy to figure out how many people actually look at the screen and what they look at, that may change in the future.

When people have a full keyboard, they ask complex questions (think about chatbots), while SMS is simpler and more linear (step by step). When designing spoken conversations, think about what level of detail can be spoken and what could be shown as a complement on a screen for newer devices. Don’t be afraid to read out loud!

Context is everything. Sometimes, useful could be as simple as finding plumbers or electricians recommended by you, or the nearest approved auto repair services. Voice isn’t a helpdesk or a chatbot. Propose local insights only available to this user, on the fly when they need it. Suggest things over time as you learn what they like. Don’t try to get too transactional.

To Log Or Not To Log?

Yes, it’s controversial, but do listen to errors and queries. Understand how your skill, action or app is doing, make sure your models are doing well. Identify things that customers are asking but the skill doesn’t know (your conversational “bounce rate”; the times when the answer is “I don’t know”). Make sure the answers provided are accurate. It’s unfortunate, but some things still do require human review.

Discoverability Isn’t The Issue

The so-called discoverability issue may be less of an issue and more of a bias. A conversation is not an app, what matters is the utility and the value. We’re at this time when there isn’t really an “app store” for voice, and we call it an issue. But there was a time when there wasn’t an app store for smartphones (remember that?).

App stores were built so that developers could have an easy way to launch additional functionalities on top of an OS layer that were easily discoverable and could be monetized. Do we need this today? It has become the way our brains are wired, but the real question is “how do you use AI and voice today?” Don’t ask users to reach you where you are if it doesn’t fit their needs.

As to commands, skill descriptions allow users to know what skills do, the key being to keep them simple and to the point so they can be easily remembered. People typically ask their assistant things. At the end of the day, it’s about the value, not about adding capabilities to a device.

As mentioned earlier, capabilities, within the context of the smart home, come from adding other connected things to your “main OS”. If and how these devices should connect to each other will determine whether we’re talking about the ‘internet of things’ or just ‘things connected to the internet’.

Open Source Or Custom Is Up To You

Whether you should create your own language and intent model or go open source depends on a few factors.

· What does your skill need to be good at?

· Is the terminology specific?

· Where are your customers going to go and where are they now?

· What’s the goal: customer engagement? Affinity? Increase transaction volume?

Domain specific models allow to infuse context inside of them. They narrow down the scope for certain expressions to avoid too many clarifying questions. They can be built based on when and where people need them. If your skill is meant to be used for a safari, “jaguar” won’t refer to the same thing as if you’re building a car insurance claim skill.

If you think Open Source should work fine, consider TensorFlow, an IBM library.

Caching: A little goes a long way

When it comes to session maintenance, multiple intents (asking three things at once) can be a bit of an issue. Ask yourself “how long should I keep the knowledge of what the user was doing before clearing the cache?”

Mind Your Language

Point, click, tap has been around for a while, but conversations are a completely different story. Writing for voice interfaces doesn’t work like visual or written communication. Voice UX is a big learning curve because conversations are very complex. The way people respond to speech is completely different.

Spend time figuring out appropriate responses and how to phrase them. Bear in mind words look very different when read on paper compared to having Alexa say them back. What looks okay may not sound okay.

It’s a lot more like writing for the radio. The Strunk And White approach is worth studying for designers. Pay attention to not be too factual, because voices have an implied tone; think about the customer’s perspective. Some things said out loud don’t really resonate; you need to see them. Lists, addresses and numbers, for instance.

A successful user experience uses language clearly and precisely. Focus on low risk questions; answers you could get an SMS on. Keep them fairly short. The ordering in which you present the information is crucial because most people will mainly remember the last bit. People with backgrounds in philosophy and anthropology can help getting it right.

Remember, at first, the predominant use might still be mobile and SMS, because it depends on penetration rate of voice-enabled devices within homes.

Work On Your Dad Jokes

You spent a lot on your logo, what about your voice? Like quirky jingles on the radio, the recall can be huge. Experiment. Make moodboards of what you want to sound like.

Some people think of voice strategy the same way they approached web strategy: “I have to do it because others are doing it”. What if I told you you could talk to your customers every day, wherever they are? What would you talk about?

Don’t make character and personality an afterthought, design your sense of humor, have conversations about gender, avoiding stereotypes and how to behave when customers chitchat and tests you (“I love you”, “Where should we bury the bodies?”). This is the fun part: hide some Easter eggs in there.

And don’t forget to have fun!

How To Design Great Skills For Voice Assistants in 2019 was originally published in Hacker Noon on Medium, where people are continuing the conversation by highlighting and responding to this story.

Publication date

07/01/2019 - 13:36