blog




  • Essay / Speech Recognition - 1021

    Speech recognition is the act of a computer listening to what you say and converting it into written text. This may seem like a very simple task to accomplish, considering that computers are surprisingly fast and powerful, but it is quite the opposite. Most recognition software can achieve between 98% and 99% accuracy if used under optimal conditions. Optimal conditions assume that users have voice characteristics that match the training data, can adequately adapt to the speaker, and work in a noisy environment (e.g., a quiet office or laboratory). The two essential steps that a speech recognition system must accomplish are training and decoding. There are two classes of speech recognition, one called speaker-independent, which has a small vocabulary of words/commands, and the other called speaker-dependent, which has a very large vocabulary but must be trained for each user. This training step may involve a user reading a book aloud on the computer, while the system tracks the spoken words. It may also involve entering a pre-recorded speech and transcribing the audio into the corresponding text word. Training the speaker-independent system involves collecting different commands and configuring them for different accents and for differences in male and female voices, slang, acronyms, word articulation, and temporal non-uniformity. One fascinating obstacle that speech recognition must overcome is homonyms, which are words that sound the same but have different meanings. The common solution to this problem is to understand the context in which the possible words will be used and choose the corresponding word. This solution can also be used in all voice forms. A recent application of voice recognition technology in entertainment is the horror film Last Call. When spectators purchase their tickets, they are asked to provide their cell phone number. Before the film starts, the database of telephone numbers for the film screening is sent to the company. Sometimes during the film, an audience member's cell phone rings, and it's up to that audience member to give instructions to the character on screen. Surprisingly, the film is controlled by the voice of a random viewer. The software must also overcome the background noise of the film. Voice recognition has even reached the video game market. Their defining feature is that the player fully controls the game by using a microphone to issue commands to on-screen characters. commands are interpreted by the game's voice recognition software.