All you need to know about Voice Recognition

By Hitesh Raj Bhagat & Karan Bajaj, ET Bureau : Apple recently demonstrated the power of voice recognition using Siri on the iPhone 4S.

But the technology has been around for a while now and you can easily get it for your PC, MAC or smartphone. Hitesh Raj Bhagat and Karan Bajaj explain how

Apple's Siri demonstration at the launch of the iPhone 4S and its subsequent marketing videos have raised a lot of interest in voice and speech recognition . You've probably used some variation of the technology when you used voice dialling in your phone or when you speak to an automated IVR system on the phone. For personal use, various reasons for the low adoption rates include poor accuracy, a training period before you can actually start using it and a limited number of realworld applications. Over the years, the technology has evolved and now there are several mobile apps and computer programs available to help you talk to a machine.

A BRIEF HISTORY

Speech recognition was first introduced to personal computing around 10 years back - around the time Windows 98 was introduced. However, you may be surprised to know that research on this technology started way back in 1936.

WHAT IS VOICE RECOGNITION?

Voice recognition and speech recognition are two different terms. Voice recognition relates to identifying an individual voice - along the same lines as a biometric scanner. Speech recognition, on the other hand, relates to identifying spoken words in the correct sense and then translating them into computer language.

HOW IT WORKS

Both speech and voice recognition work on the principal of translating 'analog' spoken words into 'digital' signals that a machine can understand. As simple as this may sound, it requires a lot of back-end processing, all the while compensating for differences in dialect, volume levels, tempo and pronunciation . Translated analog signals from speech, once converted, are then sent back to the device in digital format which in turn executes a command. Because speaking out a line takes mere seconds, translating, conversion and execution needs to be done on the fly - thus the need of a fast data connection to transfer the data to and fro.

WHAT CAN YOU USE IT FOR?

Speech to text and controlling a machine using your voice is obvious . But the technology holds promise for those with disabilities . Applications like DriveSafe.ly for your phone can read out text messages and emails for you - helpful for the visually impaired. Various apps also allow you to search the web or type out messages by speaking - helpful for those with limited motor control .

WHAT COMES NEXT?

The biggest challenge that any speech recognition system faces today is deciphering the various dialects and accents that people may have. Plus, in natural speech, we often tend to use a lot of slang, which automated systems find hard to understand. The first step would be to build a system that looks beyond any of these current issues. A possible application then, would be a universal , real-time voice translator, often seen in sci-fi movies - simply speak and a device will be able to instantly speak out the same in any language with 100% accuracy. Going forward, there are also going to be major developments in speech understanding - true artificial intelligence, when a machine can truly grasp the context of what you're saying and talk back, rather than just recognising the words.

SIRI

Siri is the personal assistant that Apple has introduced on the new iPhone 4S. The app in deeply integrated with the operating system and responds to your natural speaking voice. It can be used to make calls, write SMS, set reminders or answer questions with real-time results from the internet. The app adapts to your preferences, style of speaking and takes interactivity to an all-new level. As of now, Siri only supports English, German and French. Plus, it will only be available for iPhone 4S users.

ON A PC

Windows Speech Recognition Free

Windows 7 comes with its own system (Ease of Access). It allows you to control your PC using selective commands and also offers dictation of text.

Tazti (www.tazti.com)

$39.99 (price for 2PC license)

Tazti lets you control iTunes as well as various browser functions such as search and navigation via voice commands. It also includes a dictation feature.

ON A MAC

Dragon Dictate (www.nuance.com) $199.99

Dragon Dictate not only lets you input text by speaking, but also controls various functions like launching of programs and general navigation.

Speech: Free; OS X has a set of voice commands accessible from the Speech section that allows various applications to be controlled. It also has a built in text-to-speech engine.

IT'S NOT MAGIC, SO KEEP IN MIND THAT...

Voice recognition requires use of the microphone, be it a computer or a handheld device. High levels of ambient sound will affect the accuracy of recognition Do not skip the initial setup process to set up the microphone, speakers and volume . This makes sure that your hardware is optimised for accuracy Some of the smarter software solutions will adapt to your style of speaking. The more you use it, the better and more accurate it will get Speech to text will never be 100% accurate out of the box. Most software will give you about a 60% accuracy rate that improves to around 80% over time A fast data connection (3G or Wi-Fi ) is required for voice recognition to work quickly and reliably. This is because processing is done server side

SMARTPHONE APPS YOU CAN SPEAK TO

Google Search: Free; Search the internet by just speaking out your queries. Available for iOS, Android & BlackBerry

Voice Actions: Free; This app lets you control your Android phone using voice. You can get directions, make calls or send SMS

Edwin: Free; Edwin for Android responds to your queries in real time. It can make calls, SMS, tweet and do a host of other things

Speaktoit Assistant: Free; This Android app has a virtual character that responds verbally to questions and notifies you about events

Vlingo : Free; Vlingo is the closest alternative to Apple's Siri in terms of recognition and features. It works across platforms and offers search as well as command execution