Publication Date

2019

Document Type

Dissertation/Thesis

First Advisor

Tahernezhadi, Mansour

Degree Name

M.S. (Master of Science)

Legacy Department

Department of Electrical Engineering

Abstract

Hearing aids, automatic speech recognition (ASR) and many other communication systems work well when there is just one sound source with almost no echo, but their performance degrades in situations where more speakers are talking simultaneously or the reverberation is high. Speech separation and speech enhancement are core problems in the field of audio signal processing. Humans are remarkably capable of focusing their auditory attention on a single sound source within a noisy environment, by de-emphasizing all other voices and interferences in surroundings. This capability comes naturally to us humans. However, speech separation remains a significant challenge for computers. It is challenging for the following reasons: the wide variety of sound type, different mixing environment, and the unclear procedure to distinguish sources, especially for similar sounds. Also, perceiving speech in low signal/noise (SNR) conditions is hard for hearing-impaired listeners. Therefore, the motivation is to advance the speech separation algorithms to improve the intelligibility of noisy speech. Latest technologies aim to empower machines with similar abilities. Recently, the deep neural network methods achieved impressive successes in various problems, including speech enhancement, which the task to separate the clean speech of the noise mixture.

Due to the advances in deep learning, speech separation can be viewed as a classification problem and treated as a supervised learning problem. Three main components of speech separation or speech enhancement using deep learning methods are acoustic features, learning machines, and training targets. This work aims to implement a single-channel speech separation and enhancement algorithm utilizing machine learning, deep neural networks (DNNs). An extensive set of speech from different speakers and noise data is collected to train a neural network model that predicts time-frequency masks from noisy and mixture speech signals. The algorithm is tested using various noises and combinations of different speakers. Its performance is evaluated in terms of speech quality and intelligibility.

In this thesis, I am proposing a variant of the recurrent neural network, which is GRU (gated recurrent unit) for the speech separation and speech enhancement task. It is a simpler model than the LSTM (long short-term memory), which is used now for the task of speech enhancement and speech separation, consisting of a smaller number of parameters and matching the performance of the speech separation and speech enhancement of LSTM networks.

Recommended Citation

Shah, Sagar V., "Implementation and Evaluation of Gated Recurrent Unit For Speech Separation and Speech Enhancement" (2019). Graduate Research Theses & Dissertations. 7654.
https://huskiecommons.lib.niu.edu/allgraduate-thesesdissertations/7654

Extent

90 pages

Language

eng

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Download

Included in

Electrical and Computer Engineering Commons

COinS

Graduate Research Theses & Dissertations

Implementation and Evaluation of Gated Recurrent Unit For Speech Separation and Speech Enhancement

Publication Date

Document Type

First Advisor

Degree Name

Legacy Department

Abstract

Recommended Citation

Extent

Language

Publisher

Rights Statement

Rights Statement 2

Media Type

Included in

Browse

Search

Author Corner

Graduate Research Theses & Dissertations

Implementation and Evaluation of Gated Recurrent Unit For Speech Separation and Speech Enhancement

Author

Publication Date

Document Type

First Advisor

Degree Name

Legacy Department

Abstract

Recommended Citation

Extent

Language

Publisher

Rights Statement

Rights Statement 2

Media Type

Included in

Share

Browse

Search

Author Corner