Design of countermeasures for replay spoof speech attack

Tak, Hemlata

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Tak, Hemlata
dc.date.accessioned	2019-03-19T09:31:00Z
dc.date.available	2019-03-19T09:31:00Z
dc.date.issued	2018
dc.identifier.citation	Tak, Hemlata (2018). Design of Countermeasures for Replay Spoof Speech Attack. Dhirubhai Ambani Institute of Information and Communication Technology, xvi, 70 p. (Acc. No: T00742)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/776
dc.description.abstract	Automatic Speaker Verification (ASV) system is a biometric person authentication system to verify a claimed speaker's identity from his/her voice with the help of machines. The ASV systems are vulnerable to various types of spoofing attacks, such as impersonation, speech synthesis (SS), voice conversion (VC), replay and twins. Replay attack poses one of the most difficult challenge for the use of ASV systems in the practical scenarios, as it does not require any specific expert knowledge and advanced equipment. In this work, we present a standalone replay Spoof Speech Detection (SSD) task to classify the natural vs. replayed speech. In the earlier studies, researchers mainly used vocal tract system-based (segmental) information for replay SSD. However, during replay mechanism, excitation source-based information also gets affected (in particular, degradation in pitch (F0) source harmonics at the higher frequency regions) due to recording environment and replay devices. Hence, in this thesis, we have explored the excitation source-based feature set along with system-based features for replay SSD task. In particular, we proposed the novel Linear Frequency Residual Cepstral Coefficients (LFRCC) for replay SSD task. The objective of using this novel feature set for replay SSD task is to explore possible complementary excitation source information present in the Linear Prediction (LP) residual-based features. In addition, we also proposed system-based features, namely, Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) features using Hilbert Transform (HT) demodulation technique. These HT-based Instantaneous Amplitude Cepstral Coefficients (IACC) and Instantaneous Frequency Cepstral Coefficients (IFCC) feature sets are able to capture the information present in a slowly-varying envelope and fast-varying changes in frequency. Experiments were performed on ASV Spoof 2017 Challenge database with Gaussian Mixture Model (GMM) and Convolutional Neural Network (CNN) classifiers. On the other hand, the score-level fusion of source-based features and system-based features significantly improved the performance. Furthermore, for a fixed feature set, when we have fused GMM and CNN classifier at a score-level a significant reduction in % Equal Error Rate (EER) is obtained. Furthermore, we have also analyze the effect of classifier-level fusion for replay SSD task.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Spoofing countermeasures
dc.subject	Speech Processing
dc.subject	Speaker verification
dc.subject	Voice authentication
dc.classification.ddc	006.3 TAK
dc.title	Design of countermeasures for replay spoof speech attack
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201611061
dc.accession.number	T00742

Files in this item

Name:: 201611061_Hemlata Tak.pdf
Size:: 2.497Mb
Format:: PDF
Description:: 201611061

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record