Design of robust automatic speaker verification system in adverse conditions

Rajpura, Divyesh G.

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Rajpura, Divyesh G.
dc.date.accessioned	2020-09-22T19:13:10Z
dc.date.available	2023-02-16T19:13:10Z
dc.date.issued	2020
dc.identifier.citation	Rajpura, Divyesh G. (2020). Design of robust automatic speaker verification system in adverse conditions. Dhirubhai Ambani Institute of Information and Communication Technology. xii, 69 p. (Acc.No: T00861)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/939
dc.description.abstract	The Automatic Speaker Verification (ASV) aims to verify the identity of a person from his/her voice with the help of machines. It has become an essential component of many speech-related applications due to its use as a biometric authentication system. The traditional approaches to the ASV have achieved better performance in clean conditions, high-quality, and near-field speech. However, it remains challenging under adverse conditions, such as noisy environment, mismatch conditions, far-field, and short-duration speech. This thesis focuses on investigating the robustness of the traditional approaches for the ASV in adverse conditions. The far-field speech signal is degraded due to reverberation, the proximity of the microphones, and the quality of the recording devices, in contrast to the near-field speech. Therefore, to reduce the effects of noise in far-field speech, we investigate the Teager Energy Operator (TEO)-based feature sets, namely, Instantaneous Amplitude Cepstral Coefficients (IACC), and Instantaneous Frequency Cepstral Coefficients (IFCC) along with the conventional Mel Frequency Cepstral Coefficients (MFCC) feature set. In real-life applications of the ASV, the short-duration utterances are a common problem. Finding speaking patterns and extracting speaker-specific information from short utterances is difficult, due to limited phonetic variability. In this context, we analyze the robustness of various Statistical and Deep Neural Network (DNN)-based speaker representations, namely, i-vector, x-vector, and d-vector. In this thesis, another contribution is in the field of Voice Conversion (VC). The Singing Voice Conversion (SVC) in the presence of background music has a wide range of applications, such as dubbing of the songs, and singing voice synthesis. However, recent approaches to SVC do not pay much attention to the background music. To address this issue, we propose a new framework consisting of music separation followed by voice conversion. Due to the limited availability of speaker-specific data, we also perform an extensive analysis using different transfer learning and fine-tuning-based systems.
dc.subject	Automatic Speaker Recognition (ASR)
dc.subject	Automatic Speaker Verification (ASV)
dc.subject	Speaker Diarization (SD)
dc.subject	Deep Neural Network (DNN)
dc.subject	Voice Conversion (VC)
dc.classification.ddc	610.28 RAJ
dc.title	Design of robust automatic speaker verification system in adverse conditions
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201811032
dc.accession.number	T00861

Files in this item

Name:: 201811032.pdf
Size:: 3.846Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record