M.S. (Master of Science)
Department of Electrical Engineering
Speech processing systems||Automatic speech recognition
This thesis employs line spectral pairs (LSPs) for speech recognition and coding. LSPs are of interest largely due to their localized spectral sensitivity, i.e., a perturbation of single LSPs will not affect the entire spectrum. This property distinguishes LSPs from linear prediction (LPC) parameters. The eigenvalue method is employed to calculate LSPs from two polynomials formed from LPC coefficients. The LPC coefficients are obtained by the Levinson-Durbin algorithm or its variations, e.g., Schur algorithm and lattice algorithm. The split algorithms, which can provide computational saving almost by a factor of 1/2 compared to the classical algorithm, are also presented. LSPs were applied to speech coding and speaker-dependent (SD) isolated word recognition (IWR). A two- dimensional scalar quantizer and vector quantizer were designed to remove the interframe and intra-frame LSP correlations. Both the template-matching based dynamic time warping (DTW) and statistical pattern recognition-based Hidden Markov Model (HMM) were employed to perform IWRs. DTW is efficient for small vocabulary size IWR, but not suitable for large vocabulary IWR. The HMM method is favored for large vocabulary IWR. Discrete observation density HMM (DD-HMM) uses a vector quantization method to generate a sequence of codebook indices from the input LPCs or LSPs, then the HMM parameters are re-estimated based on the generated index sequences with proper initial estimations of the model parameters. Continuous observation density HMM (CD-HMM), on the other hand, estimates the HMM parameters from the continuous input vector directly rather than using a VQ to transform the input vector (e.g., 10 LSPs for each frame of speech) into an integer codebook index. CD-HMM uses a mixture of multivariate Gaussian distribution functions to estimate an arbitrary shape of the pdf of input vectors. Single mixture Gaussian pdf with diagonalized covariance matrix were employed in this thesis because of its simplicity. The diagonalized covariance matrix performs better than full covariance matrix when the training data base is not diverse and large enough to estimate a full matrix accurately. This is also the case in this project. Reported results indicate that LSPs achieve a higher recognition rate compared with its LPC-based counterpart using both DTW and HMM.
Li, Ren, "LSP-based speech coding and recognition" (1995). Graduate Research Theses & Dissertations. 3840.
ix, 111 pages
Northern Illinois University
Rights Statement 2