Adaptive Algorithm for Speech Compression

Using Cosine Packet Transform

This paper presents a new adaptive algorithm for speech compression using cosine packet transform. The proposed algorithm uses packet decomposition, which reduces a computational complexity of a system. This paper compare the compression ratio of methods using wavelet transform, cosine transform, wavelet packet transform and proposed adaptive algorithm using cosine packet transform for different speech signal samples. The mean compression ratio is calculated for all the methods and compared. The implemented results show that the proposed compression algorithm gives the better performance for speech signals.

Speech is a very basic way for humans to convey information to one another. With a bandwidth of only 4kHz, speech can convey information with the emotion of a human voice. People want to be able to hear some one . voice from anywhere in the world . as if the person was in the same room. As a result a greater emphasis is being placed on the design of new and efficient speech coders for voice communication and transmission. Today applications of speech coding and compression have become very numerous. Many applications involve the real time coding of speech signals, for use in mobile satellite communications, cellular telephony, and audio for videophones or video teleconferencing systems. Other applications include the storage of speech for speech synthesis and playback, or for the transmission of voice at a later time. Some examples include voice mail systems, voice memo wristwatches, voice logging recorders and interactive PC software.

Traditionally speech coders can be classified into two categories: waveform coders and analysis/synthesis vocoders (from .voice coders.). Waveform coders attempt to copy the actual shape of the signal produced by the microphone and its associated analogue circuits . A popular waveform coding technique is pulse code modulation (PCM), which is used in telephony today. Vocoders use an entirely different approach to speech coding, known as parameter coding, or analysis/synthesis coding where no attempt is made at reproducing the exact speech waveform at the receiver, only a signal perceptually equivalent to it. These systems provide much lower data rates by using a functional model of the human speaking mechanism at the receiver. One of the most popular techniques for analysis synthesis coding of speech is called Linear Predictive Coding (LPC). Some higher quality vocoders include RELP (Residual Excited Linear Prediction) and CELP (Code Excited Linear Prediction) .

This thesis looks at a new technique for analyzing and compressing speech signals using wavelets. Very simply wavelets are mathematical functions of finite duration with an average value of zero that are useful in representing data or other functions. Any signal can be represented by a set of scaled and translated versions of a basic function called the .mother wavelet.. This set of wavelet functions forms the wavelet coefficients at different scales and positions and results from taking the wavelet transform of the original signal. The coefficients represent the signal in the wavelet domain and all data operations can be performed using just the corresponding wavelet coefficients.

Speech is a non-stationary random process due to the time varying nature of the human speech production system. Non-stationary signals are characterized by numerous transitory drifts, trends and abrupt changes. The localization feature of wavelets, along with its time-frequency resolution properties makes them well suited for coding speech signals. In designing a wavelet based speech coder, the major issues explored in this thesis are:

i. Choosing optimal wavelets for speech,

ii. Decomposition level in wavelet transforms,

iii. Threshold criteria for the truncation of coefficients,

iv. Efficiently representing zero valued coefficients and

v. Quantizing and digitally encoding the coefficients.

The performance of the wavelet compression scheme in coding speech signals and the quality of the reconstructed signals is also evaluated.

It is generally believed that abrupt stimulus changes, which in speech may be time-varying frequency edges associated with consonants, transitions between consonants and vowels and transitions within vowels are critical to the perception of speech by humans and for speech recognition by machines. Noise affects speech transitions more than it affects quasi-steady-state speech. I believe that identifying and selectively amplifying speech transitions may enhance the intelligibility of speech in noisy conditions. The purpose of this study is to evaluate the use of wavelet transforms to identify speech transitions. Using wavelet transforms may be computationally efficient and allow for real-time applications. The discrete wavelet transforms (DWT), stationary wavelets transform (SWT) and wavelet packets (WP) are evaluated. Wavelet analysis is combined with variable frame rate processing to improve the identification process. Variable frame rate can identify time segments when speech feature vectors are changing rapidly and when they are relatively stationary. Energy profiles for words, which show the energy in each node of a speech signal decomposed using wavelets, are used to identify nodes that include predominately transient information and nodes that include predominately quasi-steady-state information, and these are used to synthesize transient and quasi-steady-state speech components. These speech components are estimates of the tonal and nontonal speech components, which Yoo et al identified using time-varying band-pass filters. Comparison of spectra, a listening test and mean-squared-errors between the transient components synthesized using wavelets and Yoo’s non tonal components indicated that wavelet packets identified the best estimates of Yoo’s components. An algorithm that incorporates variable frame rate analysis into wavelet packet analysis is proposed. The development of this algorithm involves the processes of choosing a wavelet function and a decomposition level to be used. The algorithm itself has 4 steps: wavelet packet decomposition; classification of terminal nodes; incorporation of variable frame rate processing; synthesis of speech components.

With rapid deployment of speech compression technologies, more and more speech content is stored and transmitted in compressed formats. Speech signal has unique properties that differ from a general audio/music signals. Speech is a signal that is more structured and band-limited around 4 kHz. These two facts can be exploited through different models and approaches and at the end. Make it easier to compress. Today, applications of speech compression involve real time processing in mobile satellite communication. Cellular telephony, internet: telephony, audio for videophones or video teleconferencing system among others. Other applications include also storage and synthesis system used, For example in voice mail system. Voice memo wrist watches. Voice logging recorders and interactive PC software. The idea of speech compression is to compress speech signal to take up less storage space and less bandwidth for transmission. To meet this goal different method for compression have been designed and developed by various reachers. The speech compression is used in digital telephony, in multimedia and in the security of digital communications. Before the introduction of Packet based transform techniques, audio coding techniques used DFT and DCT with window functions such as rectangular and sine-taper functions. However, these early coding techniques have failed to fulfill the contradictory requirements imposed by high-quality audio coding. For example, with a rectangular window the analysis/synthesis system is critically sampled, i,e., the overall number of the transformed domain sampled is equal to the number of time domain samples, but the system suffers from poor frequency resolution and block effect, which are introduced after quantization or other manipulation in the frequency domain. Over lapped windows allow for better frequency response functions but carry the penalty of better frequency domain, thus not critically sampled. Discrete Cosine Packet Transform is currently the best solution, which has satisfactory solved the paradox.

Speech compressions are introduction are done by either based on linear prediction or based on orthogonal transforms methods. On the basis of the classical papers written by Shannon, and Kolmogorov, recently was highlighted a song connection between the systems proposed in many lossy compression standards and the harmonic analysis. All these systems use orthogonal transforms. The algorithm described in this paper belongs to the second category. Unfortunately there is no any fast algorithm for the computation of orthogonal transform. This is the reason why in practice other orthogonal transforms are used. The quality of compression system can be appreciated with the aid of his rate distortion function A compression system is better better than other if at equal distortions, it realizes a higher compression rate. The maximization of compression rate can be done if ,a good selection of orthogonal transform be made.

This paper is organized as follows. The mathematical model for speech signal and the description about Discrete cosine transform is presented. With necessary mathematical modeling, the proposed adaptive algorithm for speech compression is explained. The developed algorithm is tested for various speech signal and signal samples and comparison is made with wavelet transform, cosine transform, and wavelet packet transform.

Conclusion

A new compression method based on adaptive threshold detector is proposed and tested. The simulated results show that the proposed algorithm gives the better compression ratio as compared with other methods. Using these methods a mean compression rate of 28.275 was obtained in the simulation report. This value is superior to mean compression rate, of other methods, using fast DCT algorithm the proposed method is can be implemented on a Digital Signal Processor the proposed system is a good alternative to the speech compression systems based on the linear prediction approaches.