Replay spoof detection on voice assistants and automatic speaker verification systems

Acharya, Rajul

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Acharya, Rajul
dc.date.accessioned	2020-09-22T16:57:47Z
dc.date.available	2023-02-16T16:57:47Z
dc.date.issued	2020
dc.identifier.citation	Acharya, Rajul (2020). Replay spoof detection on voice assistants and automatic speaker verification systems. Dhirubhai Ambani Institute of Information and Communication Technology. xiii, 76 p. (Acc.No: T00850)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/928
dc.description.abstract	Automatic Speaker Verification (ASV) systems and Voice Assistants (VAs) are highly vulnerable to the spoofing attacks. Spoofing refers to an intentional circumvention wherein an imposter tries to manipulate a biometric system simply by masquerading as another genuinely enrolled person. ASV systems are vulnerable to five kinds of spoofing attacks, namely, Speech Synthesis (SS), Voice Conversion (VC), Impersonation, Twins, and Replay. Replay attack on voice biometric, refers to the fraudulent attempt made by an imposter to spoof another person’s identity by replaying the pre-recorded voice samples in front of an Automatic Speaker Verification (ASV) system. Amongst all the spoofing attack, replay attack is the most simple to execute but hard to detect. In particular, replay attack on ASV system or VAs done using a high quality recording and playback device is very hard to detect as it is very similar to the genuine speaker. Given the vulnerabilities of replay spoofing attacks on ASV and VA systems, this thesis aims at developing effective countermeasures to protect these systems from such malicious attempts. In this thesis five novel feature sets are developed for replay spoof detection task. Out of these five the first three, namely, Cochlear Filter Cepstral Coefficients Instantaneous Frequency using Energy Separation Algorithm (CFCCIF-ESA), Enhanced Teager Energy Cepstral Coefficients (ETECC), and u-vector are used for replay detection on ASV systems whereas Cross-Teager Energy Cepstral Coefficients (CTECC), and Spectral Root Cepstral Coefficients (SRCC) is used for replay detection on VA systems. Performance of the proposed feature sets is evaluated using two datasets, namely, ASVspoof 2017 version 2.0 dataset for replay detection on ASV systems, and Realistic Replay Attack Microphone Array Speech Corpus (ReMASC) used for replay detection on VA systems. Results obtained are compared against the baseline Constant Q Cepstral Coefficients (CQCC), Linear Frequency Cepstral Coefficients (LFCC), and state-of-the-art Mel Frequency Cepstral Coefficients (MFCC) feature sets.
dc.subject	Automatic Speaker Verification (ASV) systems
dc.subject	Voice Assistant
dc.subject	Cochlear Filter Cepstral Coefficients Instantaneous Frequency using Energy Separation Algorithm (CFCCIF-ESA)
dc.subject	Enhanced Teager Energy Cepstral Coefficients (ETECC)
dc.subject	Cross-Teager Energy Cepstral Coefficients (CTECC)
dc.subject	Spectral Root Cepstral Coefficients (SRCC), Constant Q Cepstral Coefficients (CQCC)
dc.subject	Linear Frequency Cepstral Coefficients (LFCC)
dc.subject	Mel Frequency Cepstral Coefficients (MFCC)
dc.classification.ddc	006.454 ACH
dc.title	Replay spoof detection on voice assistants and automatic speaker verification systems
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	201811019
dc.accession.number	T00850

Files in this item

Name:: 201811019.pdf
Size:: 6.446Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record

Replay spoof detection on voice assistants and automatic speaker verification systems

Files in this item

This item appears in the following Collection(s)

Related items

Significance of Teager Energy Operator for Speech Applications ﻿

Features for Speech Emotion Recognition ﻿

Handcrafted Feature Design for Voice Liveness Detection and Countermeasures for Spoof Attacks ﻿

Significance of Teager Energy Operator for Speech Applications

Features for Speech Emotion Recognition

Handcrafted Feature Design for Voice Liveness Detection and Countermeasures for Spoof Attacks