Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/1066
Title: Design of Voice Privacy System
Authors: Patil, Hemant A.
Prajapati, Gauri P.
Keywords: Voice Privacy
Intelligent Personal Assistants
Mel cepstral coefficients
Formant bandwidth
Linear Prediction
Speech signal
Librispeech
Issue Date: 2021
Citation: Prajapati, Gauri P. (2021). Design of Voice Privacy System. Dhirubhai Ambani Institute of Information and Communication Technology. xiv, 92 p. (Acc.No: T00983)
Abstract: Extensive use of Intelligent Personal Assistants (IPA) and biometrics in our day to day life asks for privacy preservation while dealing with personal data. To that effect, efforts have been made to preserve the personally identifiable characteristics from human voices using different techniques. Anonymization is one of the techniques in which only speaker-related speech characteristics are modified, such that the resultant voice sounds as if a different speaker has spoken the same utterance. The system which incorporates these anonymization methods for preserving speaker identity in voice data is known as a voice privacy system. Various anonymization approaches have been explored based on different Voice Conversion (VC) methods, where the original speaker’s voice is converted to a pseudo speaker’s voice. However, it is challenging to anonymize the original voice, such that the anonymized voice quality is preserved. This thesis focuses on investigating various speaker-specific information modifications for voice privacy systems. The formant bandwidth, speaking rate, pitch, and Mel cepstral coefficients are chosen for anonymization. Formant bandwidth is related to various speaker-specific losses introduced during speech production, such as lip radiation, wall vibration, etc. Hence, formant bandwidth is modified to anonymize the original voice with the help of Linear Prediction (LP) analysis of the speech signal. To have more intelligible anonymized voices with better speaker anonymity, three perturbation approaches were investigated, namely, speed, tempo, and pitch perturbation. They alter the pitch and/or speaking rate, providing time-scale and pitch modification of original voices for anonymization. However, to introduce some non-linearity to the system, such that an attacker finds it difficult to undo the anonymization process, the CycleGAN framework is proposed. The Cycle Consistent Generative Adversarial Network (CycleGAN) is used to modify (transform) the speaker’s gender as well as the other prosodic aspects using their Mel CEPstral coefficients (MCEPs), and F0. The speaker anonymization and intelligibility are measured objectively using the Automatic Speaker Verification (ASV) and Automatic Speech Recognition (ASR) experiments, respectively, on development and test sets of Librispeech and VCTK datasets. The low WER reflects the preservation of linguistic content, and the higher EER reflects the ability of the voice privacy system to hide the original speaker characteristic.
URI: http://drsr.daiict.ac.in//handle/123456789/1066
Appears in Collections:M Tech Dissertations

Files in This Item:
File Description SizeFormat 
201911058_Gauri_Prajapati - hemant patil.pdf
  Restricted Access
18.53 MBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.