Design of Voice Privacy System

Prajapati, Gauri P.

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/1066

Title:	Design of Voice Privacy System
Authors:	Patil, Hemant A. Prajapati, Gauri P.
Keywords:	Voice Privacy Intelligent Personal Assistants Mel cepstral coefficients Formant bandwidth Linear Prediction Speech signal Librispeech
Issue Date:	2021
Citation:	Prajapati, Gauri P. (2021). Design of Voice Privacy System. Dhirubhai Ambani Institute of Information and Communication Technology. xiv, 92 p. (Acc.No: T00983)
Abstract:	Extensive use of Intelligent Personal Assistants (IPA) and biometrics in our day to day life asks for privacy preservation while dealing with personal data. To that effect, efforts have been made to preserve the personally identifiable characteristics from human voices using different techniques. Anonymization is one of the techniques in which only speaker-related speech characteristics are modified, such that the resultant voice sounds as if a different speaker has spoken the same utterance. The system which incorporates these anonymization methods for preserving speaker identity in voice data is known as a voice privacy system. Various anonymization approaches have been explored based on different Voice Conversion (VC) methods, where the original speaker’s voice is converted to a pseudo speaker’s voice. However, it is challenging to anonymize the original voice, such that the anonymized voice quality is preserved. This thesis focuses on investigating various speaker-specific information modifications for voice privacy systems. The formant bandwidth, speaking rate, pitch, and Mel cepstral coefficients are chosen for anonymization. Formant bandwidth is related to various speaker-specific losses introduced during speech production, such as lip radiation, wall vibration, etc. Hence, formant bandwidth is modified to anonymize the original voice with the help of Linear Prediction (LP) analysis of the speech signal. To have more intelligible anonymized voices with better speaker anonymity, three perturbation approaches were investigated, namely, speed, tempo, and pitch perturbation. They alter the pitch and/or speaking rate, providing time-scale and pitch modification of original voices for anonymization. However, to introduce some non-linearity to the system, such that an attacker finds it difficult to undo the anonymization process, the CycleGAN framework is proposed. The Cycle Consistent Generative Adversarial Network (CycleGAN) is used to modify (transform) the speaker’s gender as well as the other prosodic aspects using their Mel CEPstral coefficients (MCEPs), and F0. The speaker anonymization and intelligibility are measured objectively using the Automatic Speaker Verification (ASV) and Automatic Speech Recognition (ASR) experiments, respectively, on development and test sets of Librispeech and VCTK datasets. The low WER reflects the preservation of linguistic content, and the higher EER reflects the ability of the voice privacy system to hide the original speaker characteristic.
URI:	http://drsr.daiict.ac.in//handle/123456789/1066
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201911058_Gauri_Prajapati - hemant patil.pdf Restricted Access		18.53 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets