Speaker recognition or voice recognition is the task of recognizing people from their voices. Such systems extract features from speech, model them and use them to recognize the person from his/her voice. Speaker recognition has a history dating back some four decades, where the output of several analog filters was averaged over time for matching. Speaker recognition uses the acoustic features of speech that have been found to differ between individuals. These acoustic patterns reflect both anatomy (e.g., size and shape of the throat and mouth) and learned behavioral patterns (e.g., voice pitch, speaking style). This incorporation of learned patterns into the voice templates (the latter called "voiceprints") has earned speaker recognition its classification as a "behavioral biometric."
Speaker recognition systems employ three styles of spoken input: text-dependent, text-prompted and text-independent. Most speaker verification applications use text-dependent input, which involves selection and enrollment of one or more voice passwords. Text-prompted input is used whenever there is concern of imposters. The various technologies used to process and store voiceprints includes hidden Markov models, pattern matching algorithms, neural networks, matrix representation and decision trees. Some systems also use "anti-speaker" techniques, such as cohort models, and world models. Ambient noise levels can impede both collection of the initial and subsequent voice samples. Performance degradation can result from changes in behavioral attributes of the voice and from enrollment using one telephone and verification on another telephone. Voice changes due to aging also need to be addressed by recognition systems.
Many companies market speaker recognition engines, often as part of large voice processing, control and switching systems. Capture of the biometric is seen as non-invasive. The technology needs little additional hardware by using existing microphones and voice-transmission technology allowing recognition over long distances via ordinary telephones (wire line or wireless). Multi-layered networks are capable of performing just about any linear or nonlinear computation, and can approximate any reasonable function arbitrarily well. Such networks overcome the problems associated with the perceptron and linear networks. However, while the network being trained may be theoretically capable of performing correctly, back propagation and its variations may not always find a solution. There are many types of neural networks for various applications multilayered perceptrons (MLPs) are feedforward networks and universal approximators. They are the simplest and therefore most commonly used neural network architectures.
Index Terms: Matlab, speaker recognition, speaker verification, speaker matching, neural networks, feature extraction, ann, artificial neural networks, nn.
Figure 1. Speech signal | |||
A simple and effective source code for Speaker Identification based on Neural Networks |
0 comments:
Post a Comment