Human friendly interactions of speaking and Hearing

Make srikumar as your homepage

< >

Please check "WHAT IS NEW?" to see new pages we are adding. Enjoy

Home

Art of Living

CAD

Cooking

Education

Engineering

Freestuff

Feng Shui

< >

Festivals

Games

Health

Question papers

Humour

House plans

Jobs

Interior Design

Jokes

Kids

Music

Movies

NRI

Oman123

Contact:
L.Srikumar Pai
B.Sc( Engg.), MIE, MIWWA, MICI
Civil Engineer & CAD Specialist
Web master

See my 3d perspectives using AutoCAD & 3DS Max.
3D Album New

Sangeetha Sridhar Articles | Other useful Articles

Human friendly interactions of speaking and Hearing
( Speech recognition & Speech technology )
By Sageetha Sridhar

Talk to computers and hear them speak

Processing information and interacting using computing systems has taken leaps and bounds in several directions and one such area where it brings human closer to such systems is ‘Speech Recognition’. In simple terms it is the ability to provide inputs and instruction to computer-driven systems as normally as speaking to them and also receive information as audio instead of text to read.

Most people consider working with the keyboard and mouse a tiring effort affecting their health in the long run. But provided there exist tool that enable users to enjoy the content without having to strain to type in key words, more and more people could readily adopt technology (including a few those who are physically disadvantaged such as the blind).

Modes of speech technology

There are two types of areas where this speech technology could be felt with quite an impact. One is where the digital contents could be read out as audio without the reader having to view it. The other is the use of audio commands to operate various software systems.

Even in those systems where audio inputs drive actions, there are two major applications: one with limited vocabulary and wide-range of user and the other being large vocabulary for limited number of users.

Common Applications

Speech recognition involves both recognizing user’s voice inputs and responds through an automated system. Alternatively, they may also read out free flow text from a web page as audio. The former application is used in answering hotline queries based on pre-defined set of phrases or menu choices.

But the latter application, makes web-content accessible to all including those who are disabled. For example an extra application sits above the browser interface to read out the textual content for the users. These are broadly called as text-to-speech technology.

Systems that are tuned to one of few users by repeatedly recording their conversations to identify voice patterns and accents and map them to actual words intended, are being used in the field of transcriptions such as medicine, teaching and learning etc. Systems that are used in these conversions are called speech-to-data systems.

Intelligence built-in

In the speech-to text systems, the words are split into phonemes which are then mapped on to a table with recognized phonemes and their corresponding syllables. Depending on the sophistication of the software and hardware, this table can grow dynamically to include new ones through speech training sessions. Such applications are use to create electronic medical transcripts from doctor’s audio recordings.

Readers who wish to experiment speech recognition technology can install voice-command applications in their mobile phones or PCs, go through a training sessions to identify ambient signal niose and later use natural voice to command such as ‘Dial Sangeetha’ or ‘Click File Open’ commands. Users of Windows can activate their ‘speech recognition’ applications and train it to work reasonably well with voice commands.

Scope for research

Current systems are very much dysfunctional when used with ambient noise. The prime challenges is in cancelling the noise element, extracting pure audio and study its original composition. Most such systems work only on one-user at-a-time basis and so multi-user filtering is currently under study & research.

Most languages have words that sound similar but mean entirely something different. So the suggestion as to most appropriate word in context is learnt and improved over several sessions. Such words and their text are held in databases which can be searched when a mapping is required.

Languages have their own rich collection of similar sounding words and the way in which human dialects, moods, accents, tempo commonly make speech recognition a big challenge. Beyond the interactive menu driven word recognition, it will be over a decade before which a system with an acceptable level if accuracy can be available in the market.

The author is a technology evangelist working as Consultant at the Information Technology Authority of Oman and can be contacted at sendsangita@gmail.com or through her blog at http://digitaloman.blogspot.com or on twitter at http://twitter.com/sangitasri.