Path: EDN Asia >> Design Centre >> Consumer Electronics >> Achieve reliable hands-free voice control (Part 2)
Consumer Electronics Share print

Achieve reliable hands-free voice control (Part 2)

11 Feb 2013  | Bernie Brafman

Share this page with your friends

Memory requirements depend on vocabulary size. A single phrase (such as a trigger) requires 20KB of RAM and 130 KB of ROM/Flash (depending on the hardware platform chosen) where a 500 phrase vocabulary requires 250 KB of RAM and 700 KB of ROM/Flash. MIPS required can range from 7 to over 100 depending on the complexity of the task. For a low power hands-free voice trigger, where the goal is minimising power consumption, a system can be designed to use just 7 MIPS.

Current implementations of triggers on handsets run at the application level and can have an impact on battery life in continuous listening mode (the contextual triggers would have much less impact). A well designed system with minimal hardware activated and a digital microphone would have a current draw well under 10mA. Roadmap items include deeply embedded implementations of the trigger technology working with key component suppliers in mobile architectures. This will dramatically reduce memory requirements as well as power consumption with close to 1 mA current draw and make continuous listening a broadly available feature in the very near future.

Bluetooth technology is most often associated with the term "hands free" but ironically has needed a button push to use. There is now a Voice Control solution for CSR based headsets and car kits. It enables a dramatic leap forward in user friendliness in removing the need for confusing button presses, LED flashes, tone beeps, and manually interacting with the phone while driving and has been adopted by leading headset and car kit manufacturers.

Typical Bluetooth headset users frequently take their eyes off the road to look at the phone's display to determine Caller ID before answering, or to determine headset connection status, or to locate and push buttons for speed-dial calling. This solution is leading the way by enabling the market to move to a "hands on the wheel, eyes on the road" usage model with hands-freeVoice Control that dramatically improves safety and features while also reducing user confusion at the same time.

The solution includes speaker independent voice triggers and voice commands to replace button pushes and control operations such as pairing, connection, dialling, battery check and other operations. The user can always ask for help with the command "What Can I Say?" Commands are activated with a trigger phase such as "Hello Blue Genie". Beam forming is supported for dual microphone devices with improvement in noise of up to 7dB (beam forming does not alter the audio spectral characteristics in the way that noise reduction techniques do).

For systems with access to the Caller ID name, a small footprint embedded TTS solution can inform the user of a caller's name with a highly intelligible synthesised pronunciation before letting the user "answer" or "ignore" the call with a noise robust voice trigger. The TTS output can be "morphed" so that any voice in any language can be used as a male, female, adult, child, or alien sound. This can also be done in real time with the user's voice for fun or security. Further, custom prompts can be recorded and compressed for later playback using high quality low bit rate voice compression.

Similarly, Automotive Infotainment systems can implement voice triggers to replace button presses to start speech dialogs which still present driver distraction issues. Many current and planned systems are based on powerful processing platforms (or integrated handsets) which can easily support hands-free Voice Control. In the car, the Speaker Identification feature can be used to retrieve a driver's preferences such as seat position and music selections, and BlueGenie-like commands can be used for incoming and outgoing calls. As with mobile applications, the medium vocabulary size capabilities of an embedded speech recogniser can be used to create a rich user experience when combined with cloud based recognisers as well as extensive capabilities even when not connected ("Hello family sedan (saloon?) navigate to home").

 First Page Previous Page 1 • 2 • 3 Next Page Last Page

Want to more of this to be delivered to you for FREE?

Subscribe to EDN Asia alerts and receive the latest design ideas and product news in your inbox.

Got to make sure you're not a robot. Please enter the code displayed on the right.

Time to activate your subscription - it's easy!

We have sent an activate request to your registerd e-email. Simply click on the link to activate your subscription.

We're doing this to protect your privacy and ensure you successfully receive your e-mail alerts.

Add New Comment
Visitor (To avoid code verification, simply login or register with us. It is fast and free!)
*Verify code:
Tech Impact

Regional Roundup
Control this smart glass with the blink of an eye
K-Glass 2 detects users' eye movements to point the cursor to recognise computer icons or objects in the Internet, and uses winks for commands. The researchers call this interface the "i-Mouse."

GlobalFoundries extends grants to Singapore students
ARM, Tencent Games team up to improve mobile gaming

News | Products | Design Features | Regional Roundup | Tech Impact