Voice assistants are quickly becoming a fixture in consumers’ daily digital interactions. Experts, including Gary Vaynerchuk, predict voice-driven platforms will spawn hundred-billion dollar companies. Is voice-based marketing really the next big thing or the next big nothing? (See VR or wearables predictions from just a couple of years ago.) As a content marketer, you may wonder how big an impact voice user interfaces (VUIs) will have. Here are some points about the possible sonic boom (or bust) to consider.
The Case for Voice-Based Marketing
1. Critical Mass
Voice assistants aren’t new—Apple’s Siri and Google Assistant have been available for several years. But in the last 18 months, voice-assisted devices and applications have sprouted like mushrooms—from car-based systems to smart speakers, smart displays, and home appliances. And it appears we’re just at the beginning of the inflection point for smart devices.
- Smart speaker adoption rates in the U.S. are accelerating faster than any consumer device, ever.
- There are now over one billion devices that provide voice assistance.
- By 2020, over 75% of U.S. homes will have a smart device.
2. Friction-free Trust Factor
We build an intimate relationship with our digital assistant—it already knows our favorite song and how often we order in for dinner. And our reliance on VUIs is increasing. According to Gartner, 30% of web browsing sessions will be voice-based by 2020. It’s easy to volunteer additional information or give verbal approval for access to data you’ve already stored in the cloud.
Voice-based marketing lets brands piggyback on users’ existing trust in Amazon, Google, and other providers. The handoff between voice assistant and branded content is seamless. For example, if you say “Okay Google, talk to Duracell,” it will find the Duracell app and hand off control. It is then up to Duracell to guide the interaction—and reap the demographic benefits.
3. We Have the Tools
Each VUI vendor is eager to encourage more voice-centric content creation to capture additional mindshare and add more value to its respective platform. To that end, each provider has created an ecosystem for content creators to generate, publish, and track voice-based experiences.
Google refers to these experiences as “actions,” and Amazon labels them “skills.” In either case, they are essentially self-contained voice apps that let you create your own branded content and publish to each platform’s user base through an app–store like directory:
- Google Assistant Actions — With over 1 million actions, this is by far the most active ecosystem.
- Alexa Skills — Amazon lists over 50,000 third-party skills, and content creators are adding over 1,000 new skills each week.
Google and Amazon help you build your voice app with cloud-based tools that require no coding. The platforms provide extensive guides to help you plan your voice-based marketing experience and free metrics tools to track usage. Check out Google’s Conversation design site and Amazon’s Alexa Design Guide.
What about Cortana and Siri? Microsoft has a Cortana Skills directory, but its offerings are limited. Apple’s Siri is noticeably quiet here, too. Apple provides no equivalent storefront for branded voice-based content, although its SiriKit SDK lets developers add native Siri into their apps.
So, we’ve reached critical mass for voice-based devices, reliance on VUIs is increasing, and we have ready access to tools for content creation. Will voice-based marketing be the most important new marketing channel since social media? In a word, no. Here are two reasons why not.
The Case Against Voice-Based Marketing
1. Small Talk
While usage is increasing, it’s largely for simple commands and queries only. This alone is still really cool. At first blush, Alexa sounds like a genius when she can tell you how many yen to the dollar or who Tycho Brahe is. But, any conversation beyond this kind of one-and-done search query quickly reveals the limits of the platform.
The various AI engines continue to improve, but as this New York Times test recently showed, even native Google, Alexa, and Siri apps are easily tripped up by complex sentences and homophones. In the near term, the ideal VUI application will remain a single command or search query.
If you want to engage your audience in an ongoing conversation, the solution is to build your own branded action/skill that provides a more complex interaction to handle specific user intent. Which leads us to…
2. The Unhappy Path
It’s possible to build a multi-part “conversation” by scripting out the ideal course you anticipate the conversation to take. This is referred to as the “happy path,” a concept borrowed from UX or software testing. But building even simple conversations in actions/skills can be tricky. Once the assistant hands the user off to your branded app, you are no longer benefiting from the host platform’s AI to determine intent—it’s all on you.
This means you must build the complex decision trees and incorporate user input as the conversation unfolds. In practice these scripts are relatively brittle. You can try to anticipate all the possible utterances at a specific exchange, e.g., “Yes, Righty-O, Yep, Sure, Okay, Ten-four,” but it’s hard to predict every edge case (“Uh, I think so…”). Anyone who has talked to a digital assistant for more than a few minutes has likely found themselves trying to guess the “magic word.”
To increase the success rate, each exchange must be brief and predictable. In other words, the optimal user experience for voice apps will be similar to that of an Interactive Voice Response (IVR) phone tree app—not exactly groundbreaking.
Another hurdle on the happy path is time. The current VUI experience is temporal—and your digital assistant is impatient. You’ve got a limited window in which to respond. But don’t answer too quickly, either—Alexa isn’t really listening to you while she has the mic, and she’ll ignore your too-eager reply if you answer while she’s still speaking.
The experience can easily become less than frictionless.
Patient early adopters may put up with the quirks and miscues in today’s VUIs. But the basic tête à tech of today remains a novelty rather than an immersive voice-only connection to your digital life.
3. Evolutionary, Not Revolutionary
The AI engines driving VUIs are remarkable, but for now they serve as an evolution in input—an extension of the keyboard or touchscreen. For most of our digital interactions, we want to see, not hear, the response. In practice, that means voice will remain a convenient replacement for text entry in mobile micro-moments. Our tongues are replacing our thumbs.
According to The Information, only 2% of Alexa users have made a purchase through the platform. And The Smart Audio Report 2018 from NPR/Edison Research shows that none of the top 20 smart speaker activities involved commerce. This makes sense: buyers might reorder laundry detergent sight unseen, but for most purchases, they will want to see options. We are not ready to rely solely on Google’s “position zero” answer to anything more consequential than finding “the best Thai restaurant near me.”
The True State of the Art
Voice-based marketing may not be the next big thing, but it is definitely the next thing. We will see some creative uses of actions/skills as entertaining surveys, conversation starters, or other promotions, but it’s still early. A recent New York Times article noted that ad agencies anticipate voice apps to mature over the next five years; however, the current state of the art is a work in progress. One creative director reported that her agency’s current Alexa app is receiving “a lot of cuss words in the user flow” as audiences become frustrated with the suboptimal UX.
In the short term, the real “action” in voice will focus on the increase in long-tail search. When we are searching, we talk more than we type, so VUIs are great for delivering better results against our search intent. As a content creator, the way to prepare for that is to follow standard SEO best practices with an eye toward long-tail search:
- Focus on customer-driven design: Know your audience’s needs.
- Create pages with a primary job: Each page should answer a specific question.
- Speak the language: Write in simple, straightforward phrases that demonstrate authority.
Despite the rising din, it’s still early in the growth phase for voice experiences. The technology is impressive—perhaps daunting—but at the heart of any voice-based system is the content. And the successful voice experiences will be built with the same fundamentals that drive all successful content: knowing your audience, tracking their careabouts, and building pathways to help them find what they need as easily as possible.