Show simple item record

dc.contributor.advisorPatil, Hemant A.
dc.contributor.authorPrajapati, Gauri P.
dc.date.accessioned2022-05-06T14:38:42Z
dc.date.available2023-02-27T14:38:42Z
dc.date.issued2021
dc.identifier.citationPrajapati, Gauri P. (2021). Design of Voice Privacy System. Dhirubhai Ambani Institute of Information and Communication Technology. xiv, 92 p. (Acc.No: T00983)
dc.identifier.urihttp://drsr.daiict.ac.in//handle/123456789/1066
dc.description.abstractExtensive use of Intelligent Personal Assistants (IPA) and biometrics in our day to day life asks for privacy preservation while dealing with personal data. To that effect, efforts have been made to preserve the personally identifiable characteristics from human voices using different techniques. Anonymization is one of the techniques in which only speaker-related speech characteristics are modified, such that the resultant voice sounds as if a different speaker has spoken the same utterance. The system which incorporates these anonymization methods for preserving speaker identity in voice data is known as a voice privacy system. Various anonymization approaches have been explored based on different Voice Conversion (VC) methods, where the original speaker’s voice is converted to a pseudo speaker’s voice. However, it is challenging to anonymize the original voice, such that the anonymized voice quality is preserved. This thesis focuses on investigating various speaker-specific information modifications for voice privacy systems. The formant bandwidth, speaking rate, pitch, and Mel cepstral coefficients are chosen for anonymization. Formant bandwidth is related to various speaker-specific losses introduced during speech production, such as lip radiation, wall vibration, etc. Hence, formant bandwidth is modified to anonymize the original voice with the help of Linear Prediction (LP) analysis of the speech signal. To have more intelligible anonymized voices with better speaker anonymity, three perturbation approaches were investigated, namely, speed, tempo, and pitch perturbation. They alter the pitch and/or speaking rate, providing time-scale and pitch modification of original voices for anonymization. However, to introduce some non-linearity to the system, such that an attacker finds it difficult to undo the anonymization process, the CycleGAN framework is proposed. The Cycle Consistent Generative Adversarial Network (CycleGAN) is used to modify (transform) the speaker’s gender as well as the other prosodic aspects using their Mel CEPstral coefficients (MCEPs), and F0. The speaker anonymization and intelligibility are measured objectively using the Automatic Speaker Verification (ASV) and Automatic Speech Recognition (ASR) experiments, respectively, on development and test sets of Librispeech and VCTK datasets. The low WER reflects the preservation of linguistic content, and the higher EER reflects the ability of the voice privacy system to hide the original speaker characteristic.
dc.subjectVoice Privacy
dc.subjectIntelligent Personal Assistants
dc.subjectMel cepstral coefficients
dc.subjectFormant bandwidth
dc.subjectLinear Prediction
dc.subjectSpeech signal
dc.subjectLibrispeech
dc.classification.ddc006.42 PRA
dc.titleDesign of Voice Privacy System
dc.typeDissertation
dc.degreeM. Tech
dc.accession.numberT00983


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record