• Login
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage StatisticsView Google Analytics Statistics

    Design of countermeasures for replay spoof speech attack

    Thumbnail
    View/Open
    201611061 (2.497Mb)
    Date
    2018
    Author
    Tak, Hemlata
    Metadata
    Show full item record
    Abstract
    Automatic Speaker Verification (ASV) system is a biometric person authentication system to verify a claimed speaker's identity from his/her voice with the help of machines. The ASV systems are vulnerable to various types of spoofing attacks, such as impersonation, speech synthesis (SS), voice conversion (VC), replay and twins. Replay attack poses one of the most difficult challenge for the use of ASV systems in the practical scenarios, as it does not require any specific expert knowledge and advanced equipment. In this work, we present a standalone replay Spoof Speech Detection (SSD) task to classify the natural vs. replayed speech. In the earlier studies, researchers mainly used vocal tract system-based (segmental) information for replay SSD. However, during replay mechanism, excitation source-based information also gets affected (in particular, degradation in pitch (F0) source harmonics at the higher frequency regions) due to recording environment and replay devices. Hence, in this thesis, we have explored the excitation source-based feature set along with system-based features for replay SSD task. In particular, we proposed the novel Linear Frequency Residual Cepstral Coefficients (LFRCC) for replay SSD task. The objective of using this novel feature set for replay SSD task is to explore possible complementary excitation source information present in the Linear Prediction (LP) residual-based features. In addition, we also proposed system-based features, namely, Instantaneous Amplitude (IA) and Instantaneous Frequency (IF) features using Hilbert Transform (HT) demodulation technique. These HT-based Instantaneous Amplitude Cepstral Coefficients (IACC) and Instantaneous Frequency Cepstral Coefficients (IFCC) feature sets are able to capture the information present in a slowly-varying envelope and fast-varying changes in frequency. Experiments were performed on ASV Spoof 2017 Challenge database with Gaussian Mixture Model (GMM) and Convolutional Neural Network (CNN) classifiers. On the other hand, the score-level fusion of source-based features and system-based features significantly improved the performance. Furthermore, for a fixed feature set, when we have fused GMM and CNN classifier at a score-level a significant reduction in % Equal Error Rate (EER) is obtained. Furthermore, we have also analyze the effect of classifier-level fusion for replay SSD task.
    URI
    http://drsr.daiict.ac.in//handle/123456789/776
    Collections
    • M Tech Dissertations [923]

    Resource Centre copyright © 2006-2017 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV
     

     


    Resource Centre copyright © 2006-2017 
    Contact Us | Send Feedback
    Theme by 
    Atmire NV