Path: EDN Asia >> Design Centre >> Consumer Electronics >> Achieve reliable hands-free voice control (Part 2)
Consumer Electronics Share print

Achieve reliable hands-free voice control (Part 2)

11 Feb 2013  | Bernie Brafman

Share this page with your friends

Figure 1 demonstrates how voice activation compliments other aspects of speech recognition capabilities, showing the steps of a multi-stage process for creating a truly hands free voice user interface.

Figure 1: The role of voice activation.

However, this important voice-activation step requires a few critical characteristics.

Extremely fast response time. Since it basically competes with a button press, it has to have a similar or faster response time. Because the hands free system uses a probabilistic approach, it can respond without having to wait for the recogniser to determine if the word is even finished. Slow response times lead users to speak before the Step 2 recogniser is ready to listen, which is a major cause of failure.

Low power consumption. This technology can deliver "always listening" wake-up triggers with as few as 7 MIPS, and current draw requirements in 1-10 mA range on today's devices.

Highly accurate even in low SNR environments.�This means several things:

 • Works in high noise –Truly Handsfree Voice Control performs virtually flawlessly in extremely loud environments, including�music playing in the background, in a car,�or even outdoors
 • Works without a microphone in close proximity -it is responsive even at distances of 20 feet (in a relatively quiet environment) and at arm's length in noise. This is critical because many VUI based applications of the future will become commonplace in a wide variety of consumer electronics devices, and users won't want to get up and walk over to their devices to control them.
Such companies as Nuance, Google and Microsoft are prominent in the second step, which is a powerful (often cloud-based) recognition system.

The third step "Understanding Meaning" is what the original Siri was all about. This was an AI component developed under DARPA funding at SRI and later spun off and acquired by Apple. Nuance's Vlingo does a really nice job of implementing Steps 1-3 It's very likely that Google, Microsoft, Apple and Nuance all have efforts underway in the area of AI and natural language understanding.

The SEARCH in Step 4 is done via typical search engines (Google, Microsoft, Apple) and likely the independent players have developed partnerships in these areas.

Step 5 represents a good quality Text-to-Speech (TTS) engine. Providers like Nuance, Ivona, ATT, NeoSpeech, and Acapella all have quality TTS engines, and no doubt Apple, Microsoft and Google all have in-house solutions as well.

Mobile applications for smartphones, tablets and ultrabooks benefit from hands-free voice control in safety and convenience. Applications can wake up and be controlled without touching the handset in the car or across the room. As a component of a medium vocabulary size recogniser with SDKs for iOS and Android, voice triggers and extensive command menus can be combined with cloud based recognisers creating a hybrid rich user experience when connected and extensive control capabilities when not connected. Response time is so fast that no pause between the trigger and command in necessary; for example "hello computer what time is it in Tokyo?"

Triggers can be made contextual; for example if a phone number is included in a text message or email, a trigger such as "Dial the number" can be activated. Uniquely, these SDKs also support using voice triggers as Speaker Verification or Speaker Identification phrases. In these scenarios, a single user or multiple users enroll themselves by speaking the phrase a few times. When enrolled, the trigger can be used as a voice password in the case of Speaker Verification, rejecting any other speaker, or as identification from a group of enrolled users in the case of Speaker Identification, so that users preferences may be retrieved. Both predefined fixed "hard coded" triggers and User Defined Triggering systems can be implemented on the device for further personalisation (and combined with Speaker Verification/Identification.

1 • 2 • 3 Next Page Last Page

Want to more of this to be delivered to you for FREE?

Subscribe to EDN Asia alerts and receive the latest design ideas and product news in your inbox.

Got to make sure you're not a robot. Please enter the code displayed on the right.

Time to activate your subscription - it's easy!

We have sent an activate request to your registerd e-email. Simply click on the link to activate your subscription.

We're doing this to protect your privacy and ensure you successfully receive your e-mail alerts.

Add New Comment
Visitor (To avoid code verification, simply login or register with us. It is fast and free!)
*Verify code:
Tech Impact

Regional Roundup
Control this smart glass with the blink of an eye
K-Glass 2 detects users' eye movements to point the cursor to recognise computer icons or objects in the Internet, and uses winks for commands. The researchers call this interface the "i-Mouse."

GlobalFoundries extends grants to Singapore students
ARM, Tencent Games team up to improve mobile gaming

News | Products | Design Features | Regional Roundup | Tech Impact