Path: EDN Asia >> Design Centre >> Consumer Electronics >> Implementing voice processing for smart home apps
Consumer Electronics Share print

Implementing voice processing for smart home apps

26 Feb 2015  | Vineet Ganju, Trausti Thormundsson

Share this page with your friends

For speech control applications, a truly full duplex echo cancellation is a necessary component of the system, where it is desired to enable speech control concurrently with playback. For an AEC to work well it needs to have access to the signal, i.e. the echo reference, that is being played from the device. The AEC then uses the echo reference to linearly model the acoustics of the echo path in the room. However, in real systems there are often considerable non-linearities in the echo path that degrade the performance considerably – such as when the device is trying to generate loud playback volume from small loudspeakers. Another example occurs when there is non-linear post-processing being done on the playback signal after it has been sent to the AEC as echo reference. This is the case in a speech controlled set-top box (STB), where the AEC is performed and echo reference is obtained in the STB, but the TV will most likely add some unknown delay and post-processing on the audio before playing it out. Using a conventional AEC in these types of conditions will give poor performance.

This problem can be solved by connecting the AEC to the noise reduction technology described in the previous section. As long as the AEC can distinguish between far-end, near-end and double talk activity, this information can be used as the activity detection input to the USF. This approach provides truly full duplex AEC performance in systems that have non-linearity and/or impaired echo reference.

Additionally, this new AEC technology should include a delay estimation algorithm that allows it to align the echo reference and the microphone signal to account for the unknown delay in the echo path, like in the STB case.

Figures 8 and 9 show the performance of a STB system. The user is 3m from the TV and a microphone module is on top of the TV and connected to the STB. The user is giving natural language commands to the STB. At the microphone module the SPL of the desired speech is 60dB, and the SPL of the echo from the TV playback content is 72dB. The top part of Figure 8 shows the unprocessed microphone signal, the bottom part shows the processed microphone signal. Figure 9 shows the spectral content of the residual echo before and after processing. For this case the WER was 100% before processing and 8% after processing.


Figure 8: The top part of this graph shows the unprocessed microphone signal, and the bottom part shows the processed microphone signal.


Figure 9: This plot shows the spectral content of the residual echo before and after processing.


Conclusion
Conventional beamforming speech enhancement methods often fall short in providing an acceptable solution in smart home far-field conditions. It therefore becomes imperative to look at other systems that can successfully address and resolve these far-field challenges. For example, Conexant has developed cost effective, highly integrated solutions like the one described in this article with high dynamic range ADCs, excellent far-field noise/interference reduction in conditions with low SNR, low DRR and no knowledge of the direction of speech and noise, and truly full duplex acoustic echo cancellation even when the echo signal is not completely known. These solutions have been deployed by Conexant on many production platforms, from smart home devices to tablets, PCs, and wearables – all with excellent performance results.

Conventional methods such as beamforming require significant microphone cost, platform-specific tuning and many constraints on microphone location, matching and directionality of the speech and noise. The robustness of the alternative solutions described translates directly into better performance and significant cost savings during the development and manufacturing of new smart home products.


About the authors
Vineet Ganju is the Executive Marketing Director of the audio business unit at Conexant. Vineet has spent over 17 years in the semiconductor industry with most of that time spent in the consumer and automotive audio segments. Vineet's experience spans audio DSPs, mixed-signal products, amplifiers and algorithms.

Trausti Thormundsson is the Audio Chief Technology Officer at Conexant. He has worked over 17 years in the semiconductor industry in the field of digital communication and audio/voice processing, gaining significant experience in research and development, project management and customer support in the semiconductor field. Trausti has authored and co-authored more than 20 published or pending patents. He serves on the board of directors at Controlant. He has a MSc degree in Electrical Engineering from Stanford University in California.


 First Page Previous Page 1 • 2 • 3 • 4 • 5


Want to more of this to be delivered to you for FREE?

Subscribe to EDN Asia alerts and receive the latest design ideas and product news in your inbox.

Got to make sure you're not a robot. Please enter the code displayed on the right.

Time to activate your subscription - it's easy!

We have sent an activate request to your registerd e-email. Simply click on the link to activate your subscription.

We're doing this to protect your privacy and ensure you successfully receive your e-mail alerts.


Add New Comment
Visitor (To avoid code verification, simply login or register with us. It is fast and free!)
*Verify code:
Tech Impact

Regional Roundup
Control this smart glass with the blink of an eye
K-Glass 2 detects users' eye movements to point the cursor to recognise computer icons or objects in the Internet, and uses winks for commands. The researchers call this interface the "i-Mouse."

GlobalFoundries extends grants to Singapore students
ARM, Tencent Games team up to improve mobile gaming


News | Products | Design Features | Regional Roundup | Tech Impact