Speaker Recognition matlab

Speaker recognition is basically divided into two-classification: speaker recognition

and speaker identification and it is the method of automatically identify who is

speaking on the basis of individual information integrated in speech waves. Speaker

recognition is widely applicable in use of speaker’s voice to verify their identity and

control access to services such as banking by telephone, database access services,

voice dialling telephone shopping, information services, voice mail, security control

for secret information areas, and remote access to computer AT and T and TI with

Sprint have started field tests and actual application of speaker recognition

technology; many customers are already being used by Sprint’s Voice Phone Card.

Speaker recognition technology is the most potential technology to create new

services that will make our every day lives more secured. Another important

application of speaker recognition technology is for forensic purposes. Speaker

recognition has been seen an appealing research field for the last decades which still

yields a number of unsolved problems.

The main aim of this project is speaker identification, which consists of comparing a

speech signal from an unknown speaker to a database of known speaker. The system

can recognize the speaker, which has been trained with a number of speakers. Below

figure shows the fundamental formation of speaker identification and verification

systems. Where the speaker identification is the process of determining which

registered speaker provides a given speech. On the other hand, speaker verification is

the process of rejecting or accepting the identity claim of a speaker. In most of the

applications, voice is use as the key to confirm the identities of a speaker are

classified as speaker verification.

Adding the open set identification case in which a reference model for an unknown

speaker may not exist can also modify above formation of speaker identification and

verification system. This is usually the case in forensic application. In this

circumstances, an added decision alternative, the unknown does not match any of the

models, is required. Other threshold examination can be used in both verification and

identification process to decide if the match is close enough to acknowledge the

decision or if more speech data are needed.

Speaker recognition can also divide into two methods, text- dependent and text

independent methods. In text dependent method the speaker to say key words or

sentences having the same text for both training and recognition trials. Whereas in the

text independent does not rely on a specific text being speak. Formerly text dependent

methods were widely in application, but later text independent is in use. Both text

dependent and text independent methods share a problem however.

By playing back the recorded voice of registered speakers this system can be easily

deceived. There are different technique is used to cope up with such problems. Such

as a small set of words or digits are used as input and each user is provoked to

thorough a specified sequence of key words that is randomly selected every time the

system is used. Still this method is not completely reliable. This method can be

deceived with the highly developed electronics recording system that can repeat

secrete key words in a request order. Therefore T. Matsui and S. Furui have recently

proposed the text dependent speaker recognition method

Speech Feature Extraction:

In this project the most important thing is to extract the feature from the speech signal.

The speech feature extraction in a categorization problem is about reducing the

dimensionality of the input-vector while maintaining the discriminating power of the

signal. As we know from the above fundamental formation of speaker identification

and verification systems, that the number of training and test vector needed for the

classification problem grows exponential with the dimension of the given input

vector, so we need feature extraction.

But extracted feature should meet some criteria while dealing with the speech signal.

Such as:

Easy to measure extracted Speech features.

Distinguish between speakers while being lenient of intra speaker variability’s.

It should not be susceptible to mimicry.

It should show little fluctuation from one speaking environment to another.

It should be stable over time.

It should occur frequently and naturally in speech.

In this project we are using the Mel Frequency Cepstral Coefficients (MFCC)

technique to extract features from the speech signal and compare the unknown

speaker with the exist speaker in the database. Figure below shows the complete

pipeline of Mel Frequency Cepstral Coefficients.

Result:

For example, we are going to test speech wave file made by Brian, which called

‘test_brian.wav’. Assume we do not know the speaker is Brian at the beginning.

Therefore we need to apply the wav. file into our speaker recognition system to find

out who the speaker is. We run the program twice in order to get a more accurate

result. The Matlab codes are provided as following:

% First run

>> speakerID('test_brian')

Loading data...

Calculating mel-frequency cepstral coefficients for training set...

Harry

Carli

Brian

In___

Hojin

Performing K-means...

Calculating mel-frequency cepstral coefficients for test set...

Compute a distortion measure for each codebook...

Display the result...

The average of Euclidean distances between database and test wave file

Harry

7.0183

Carli

10.0679

Brian

5.9630

In___

8.4237

Hojin

7.6526

The test voice is most likely from

Brian

% Second run

>> speakerID('test_brian')

Loading data...

Calculating mel-frequency cepstral coefficients for training set...

Harry

Carli

Brian

In___

Hojin

Performing K-means...

Calculating mel-frequency cepstral coefficients for test set...

Compute a distortion measure for each codebook...

Display the result...

The average of Euclidean distances between database and test wave file

Harry

6.9995

Carli

9.9876

Brian

5.8339

In___

8.7075

Hojin

7.6390

The test voice is most likely from

Brian

From the above outputs we had in Matlab, we got 5 measurements for each run, whic

are the calculated Euclidean distances between the test wave file and codebooks from

the database. We can see that, compare to the codebooks in the database; both

calculated distortion distance of Brian have the smallest values, which are 5.9630 and

5.8339. Therefore, we can conclude that the speak person is Brian according to the

theory: “the most likely speaker’s voice should have the smallest Euclidean distance

compared to the codebooks in the database”.

Conclusion:

The goal of this project was to create a speaker recognition system, and apply it to a

speech of an unknown speaker. By investigating the extracted features of the

unknown speech and then compare them to the stored extracted features for each

different speaker in order to identify the unknown speaker.

The feature extraction is done by using MFCC (Mel Frequency Cepstral Coefficients).

The function ‘melcepst’ is used to calculate the mel cepstrum of a signal. The speaker

was modeled using Vector Quantization (VQ). A VQ codebook is generated by

clustering the training feature vectors of each speaker and then stored in the speaker

database. In this method, the K means algorithm is used to do the clustering. In the

recognition stage, a distortion measure which based on the minimizing the Euclidean

distance was used when matching an unknown speaker with the speaker database.

During this project, we have found out that the VQ based clustering approach

provides us with the faster speaker identification process.

Matlab Projects.....