Browsing by Subject "Speech recognition"

Auditory representation learning

Sailor, Hardik B. (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)

Representation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be ...

Gaussian mixture models for spoken language identification

Manwani, Naresh (Dhirubhai Ambani Institute of Information and Communication Technology, 2006)

Language Identification (LID) is the problem of identifying the language of any spoken utterance irrespective of the topic, speaker or the duration of the speech. Although A very huge amount of work has been done for ...

Generative Adversarial Networks for Speech Technology Applications

Shah, Neil (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)

The deep learning renaissance has enabled the machines to understand the observed data in terms of a hierarchy of representations. This allows the machines to learn complicated nonlinear relationships between the representative ...

Hybrid approach to speech recognition in multi-speaker environment

Trivedi, Jigish S. (Dhirubhai Ambani Institute of Information and Communication Technology, 2004)

Recognition of voice, in a multi-speaker environment involves speech separation, speech feature extraction and speech feature matching. Traditionally, Vector Quantization is one of the algorithms used for speaker recognition. ...

Person recognition from their hum

Madhavi, Maulik C. (Dhirubhai Ambani Institute of Information and Communication Technology, 2011)

In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work ...

Speech driven facial animation system

Singh, Archana (Dhirubhai Ambani Institute of Information and Communication Technology, 2006)

This thesis is concerned with the problem of synthesizing animating face driven by new audio sequence, which is not present in the previously recorded database. The main focus of the thesis is on exploring the efficient ...

Unsupervised speaker-invariant feature representations for QbE-STD

R., Sreeraj (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)

Query-by-Example Spoken Term Detection (QbE-STD) is the task of retrieving audio documents relevant to the user query in spoken form, from a huge collection of audio data. The idea in QbE-STD is to match the audio documents ...

Vowel landmark detection for speech recognition

Undhad, Ankur G. (Dhirubhai Ambani Institute of Information and Communication Technology, 2014)

Landmarks are the time instants in a speech utterance which marks the important events such as vowels, glides and consonants. This thesis proposes a novel Vowel Landmark Detection (VLD) algorithm to locate vowel landmarks ...