Publication Date

1995

Document Type

Dissertation/Thesis

First Advisor

Tahernezhadi, Mansour

Degree Name

M.S. (Master of Science)

Legacy Department

Department of Electrical Engineering

LCSH

Speech processing systems; Automatic speech recognition

Abstract

This thesis employs line spectral pairs (LSPs) for speech recognition and coding. LSPs are of interest largely due to their localized spectral sensitivity, i.e., a perturbation of single LSPs will not affect the entire spectrum. This property distinguishes LSPs from linear prediction (LPC) parameters. The eigenvalue method is employed to calculate LSPs from two polynomials formed from LPC coefficients. The LPC coefficients are obtained by the Levinson-Durbin algorithm or its variations, e.g., Schur algorithm and lattice algorithm. The split algorithms, which can provide computational saving almost by a factor of 1/2 compared to the classical algorithm, are also presented. LSPs were applied to speech coding and speaker-dependent (SD) isolated word recognition (IWR). A two- dimensional scalar quantizer and vector quantizer were designed to remove the interframe and intra-frame LSP correlations. Both the template-matching based dynamic time warping (DTW) and statistical pattern recognition-based Hidden Markov Model (HMM) were employed to perform IWRs. DTW is efficient for small vocabulary size IWR, but not suitable for large vocabulary IWR. The HMM method is favored for large vocabulary IWR. Discrete observation density HMM (DD-HMM) uses a vector quantization method to generate a sequence of codebook indices from the input LPCs or LSPs, then the HMM parameters are re-estimated based on the generated index sequences with proper initial estimations of the model parameters. Continuous observation density HMM (CD-HMM), on the other hand, estimates the HMM parameters from the continuous input vector directly rather than using a VQ to transform the input vector (e.g., 10 LSPs for each frame of speech) into an integer codebook index. CD-HMM uses a mixture of multivariate Gaussian distribution functions to estimate an arbitrary shape of the pdf of input vectors. Single mixture Gaussian pdf with diagonalized covariance matrix were employed in this thesis because of its simplicity. The diagonalized covariance matrix performs better than full covariance matrix when the training data base is not diverse and large enough to estimate a full matrix accurately. This is also the case in this project. Reported results indicate that LSPs achieve a higher recognition rate compared with its LPC-based counterpart using both DTW and HMM.

Comments

Includes bibliographical references (pages [109]-111)

Recommended Citation

Li, Ren, "LSP-based speech coding and recognition" (1995). Graduate Research Theses & Dissertations. 3840.
https://huskiecommons.lib.niu.edu/allgraduate-thesesdissertations/3840

Extent

ix, 111 pages

Language

eng

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Download

COinS

Graduate Research Theses & Dissertations

LSP-based speech coding and recognition

Publication Date

Document Type

First Advisor

Degree Name

Legacy Department

LCSH

Abstract

Comments

Recommended Citation

Extent

Language

Publisher

Rights Statement

Rights Statement 2

Media Type

Browse

Search

Author Corner

Graduate Research Theses & Dissertations

LSP-based speech coding and recognition

Author

Publication Date

Document Type

First Advisor

Degree Name

Legacy Department

LCSH

Abstract

Comments

Recommended Citation

Extent

Language

Publisher

Rights Statement

Rights Statement 2

Media Type

Share

Browse

Search

Author Corner