Phase-based Features for Replay Spoof Detection System
Abstract
Automatic Speaker Verification (ASV) system is used to verify the claimed identity of a speaker based on speech samples with the help of machines. The spoofing attacks increased due to significant technological advances which motivated the researchers to investigate different countermeasures. Replay is one of the spoofing attacks where the ASV system is fooled with the help of pre-recorded speech samples of a target speaker. Most of the state-of-the-art methods focuses on the magnitude-based features to discriminate natural speech and the spoofed speech. However, both magnitude spectrum-based and phase spectrum-based features gets affected by the quality of intermediate devices, the noise level of recording and playback environments. Only a few studies in the literature have reported the use of phase-based features for replay spoof detection. In this thesis, we explore the relative significance of various phase-based features for replay Spoof Speech Detection (SSD) task. The magnitude spectrum-based features are chosen to perform score-level fusion between the phase-based and magnitude-based features and similarly, the score-level fusion among phase-based features to capture the possible complementary information. Among various possible combinations of magnitude and phase-based features, the Equal Error Rate (EER) reduced significantly than the individual feature sets alone. While the score-level fusion among phase-based features giving better performance on the evaluation set of ASV Spoof 2017 Challenge version 1 and version 2 databases. The Gaussian Mixture Model (GMM) is used as a classifier for all the experiments presented in this thesis.
Collections
- M Tech Dissertations [923]