Design of countermeasures for spoofed speech detection system

Patel, Tanvina

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/643

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Patel, Tanvina
dc.date.accessioned	2018-05-17T08:40:14Z
dc.date.available	2018-05-17T08:40:14Z
dc.date.issued	2017
dc.identifier.citation	Tanvina Patel(2017).Design of countermeasures for spoofed speech detection system.Dhirubhai Ambani Institute of Information and Communication Technology.xxx, 225 p.(Acc.No: T00606)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/643
dc.description.abstract	Automatic Speaker Verification (ASV) systems are vulnerable to speech synthesisand voice conversion techniques due to spoofing attacks.Recently, to encourage thedevelopment of anti-spoofing measures or countermeasures for Spoofed Speech Detection (SSD) task, a standardized dataset was provided at the 'ASV spoof 2015 challenge' held at INTERSPEECH 2015. In the present work, using a traditional Gaussian Mixture Model (GMM)-based classification system, novel countermeasures are proposed considering three vital aspects of speech production mechanism, i.e., excitation source, vocal tract system (i.e., filter) and Source-Filter (S-F) interaction. Considering our relatively best performance at the ASV spoof challenge, we first discuss system-based features that include proposed Cochlear Filter Cepstral Coefficients and Instantaneous Frequency (CFCCIF) features. These use the envelope and average IF of each subband along with the transient information. The transient variations estimated by the symmetric difference (CFCCIFS) gave better discrimination. Within the framework of system-based features, the Subband Autoencoder (SBAE) feature set that embeds subband processing in the Autoencoder architecture is used. For source-based features, knowing that an actual vocal fold movement is absent in machine-generated speech, fundamental frequency (F0) contour and Strength of Excitation (SoE) are used as features. Next, as spoofed speech is easily predicted if generated by a simplified model or difficult to predict due to artifacts, we propose the use of prediction-based methods. This includes the Linear Prediction (LP), Long-Term Prediction (LTP) and Non-Linear Prediction (NLP) techniques. Lastly, the Fujisaki Model is used to analyze the prosodic differences in terms of accent and phrase between natural and spoofed speech. In addition to independently using source or system features, the time-varying dependencies or the S-F interaction features are considered. This includes exploring Features based on the residual information of the glottal excitation source and its fitted Liljencrants-Fant (LF) model, both in time-domain and frequency-domain for the SSD task.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	ASV Systems
dc.subject	Spoofed Speech Detection
dc.subject	Voice Conversion
dc.subject	Mel Frequency Cepstral Coefficients
dc.subject	Fujisaki Model
dc.classification.ddc	621.382 PAT
dc.title	Design of countermeasures for spoofed speech detection system
dc.type	Thesis
dc.degree	Ph.D
dc.student.id	201221003
dc.accession.number	T00606
Appears in Collections:	PhD Theses

Files in This Item:

File	Description	Size	Format
201221003.pdf	201221003	8.87 MB	Adobe PDF	View/Open

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets