Publication Date

5-5-2017

Document Type

Dissertation/Thesis

First Advisor

Alhoori, Hamed

Degree Name

B.S. (Bachelor of Science)

Legacy Department

Department of Computer Science

Abstract

This paper outlines an effective process for transcribing conversations from an audio file. The process involves combining speech recognition and speaker recognition to prepare the audio signals for transcription without relying on a database of preexisting vocal models. This process is intended for multi-speaker conversations where vocal models are not available or otherwise impossible to create from the amount of data provided. We find in conclusion that we can improve the performance of speech recognition on multi-speaker conversations by leveraging the classifying properties of speaker recognition to reduce variance in the dataset thus producing a result that is just as effective if we were to perform mono-speaker speech recognition.

Comments

There is an application that was built to implement the ideas in this paper hosted at https://freescribe.org

Extent

5 pages

Language

eng

Publisher

Northern Illinois University

Rights Statement

In Copyright

Rights Statement 2

NIU theses are protected by copyright. They may be viewed from Huskie Commons for any purpose, but reproduction or distribution in any format is prohibited without the written permission of the authors.

Media Type

Text

Share

COinS