Speech enhancement using microphone array for hands-free speech applications
Abstract
This thesis addresses the problem of multi-microphone speech enhancement using GSVD (Generalized singular value decomposition) based optimal filtering algorithm. This algorithm does not require any sensitive geometric information about the array layout, hence is more robust to deviations from the assumed signal model (e.g. look direction error, microphone mismatch, speech detection errors) as compared to conventional multi-microphone noise reduction techniques such as beamforming. However, high computational complexity of this algorithm makes it unsuitable for practical implementation.
The work presented in this thesis discusses a recursive version of this algorithm in which GSVD of the data and noise matrices at any instant are updated using GSVD of the matrices available at previous instant as new data arrives in. It is shown that this recursive GSVD updating scheme reduces the computational complexity of this algorithm drastically making it amenable to practical implementation. Various issues related to its implementation are addressed.
This thesis also explores the possibility of further reduction in computational complexity, by incorporating GSVD based optimal filtering algorithm in Generalized Side lobe Canceller (GSC) type structure in detail without causing any performance degradation in terms of background noise reduction and speech quality.
Collections
- M Tech Dissertations [923]
Related items
Showing items related by title, author, creator and subject.
-
Design of syllable-based speech segmentation methods for text-to-speech (TTS) synthesis system for Gujarati
Talesara, Swati (Dhirubhai Ambani Institute of Information and Communication Technology, 2013)Text-to-speech (TTS) synthesizer has been proved to be an aiding tool for many visually challenged people for reading through hearing feedback. Although there are TTS synthesizers available in English and other languages ... -
Auditory representation learning
Sailor, Hardik B. (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)Representation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be ... -
Gaussian mixture models for spoken language identification
Manwani, Naresh (Dhirubhai Ambani Institute of Information and Communication Technology, 2006)Language Identification (LID) is the problem of identifying the language of any spoken utterance irrespective of the topic, speaker or the duration of the speech. Although A very huge amount of work has been done for ...