Now showing items 1-8 of 8

    • Auditory representation learning 

      Sailor, Hardik B. (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)
      Representation learning (RL) or feature learning has a huge impact in the field of signal processing applications. The goal of the RL approaches is to learn the meaningful representation directly from the data that can be ...
    • Gaussian mixture models for spoken language identification 

      Manwani, Naresh (Dhirubhai Ambani Institute of Information and Communication Technology, 2006)
      Language Identification (LID) is the problem of identifying the language of any spoken utterance irrespective of the topic, speaker or the duration of the speech. Although A very huge amount of work has been done for ...
    • Generative Adversarial Networks for Speech Technology Applications 

      Shah, Neil (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)
      The deep learning renaissance has enabled the machines to understand the observed data in terms of a hierarchy of representations. This allows the machines to learn complicated nonlinear relationships between the representative ...
    • Hybrid approach to speech recognition in multi-speaker environment 

      Trivedi, Jigish S. (Dhirubhai Ambani Institute of Information and Communication Technology, 2004)
      Recognition of voice, in a multi-speaker environment involves speech separation, speech feature extraction and speech feature matching. Traditionally, Vector Quantization is one of the algorithms used for speaker recognition. ...
    • Person recognition from their hum 

      Madhavi, Maulik C. (Dhirubhai Ambani Institute of Information and Communication Technology, 2011)
      In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work ...
    • Speech driven facial animation system 

      Singh, Archana (Dhirubhai Ambani Institute of Information and Communication Technology, 2006)
      This thesis is concerned with the problem of synthesizing animating face driven by new audio sequence, which is not present in the previously recorded database. The main focus of the thesis is on exploring the efficient ...
    • Unsupervised speaker-invariant feature representations for QbE-STD 

      R., Sreeraj (Dhirubhai Ambani Institute of Information and Communication Technology, 2018)
      Query-by-Example Spoken Term Detection (QbE-STD) is the task of retrieving audio documents relevant to the user query in spoken form, from a huge collection of audio data. The idea in QbE-STD is to match the audio documents ...
    • Vowel landmark detection for speech recognition 

      Undhad, Ankur G. (Dhirubhai Ambani Institute of Information and Communication Technology, 2014)
      Landmarks are the time instants in a speech utterance which marks the important events such as vowels, glides and consonants. This thesis proposes a novel Vowel Landmark Detection (VLD) algorithm to locate vowel landmarks ...