Feature for Live and Spoofed Speech Detection

Gupta, Priyanka

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Gupta, Priyanka
dc.date.accessioned	2024-08-22T05:21:30Z
dc.date.available	2024-08-22T05:21:30Z
dc.date.issued	2023
dc.identifier.citation	Gupta, Priyanka (2023). Feature for Live and Spoofed Speech Detection. Dhirubhai Ambani Institute of Information and Communication Technology. xxvi, 246 p. (Acc. # T01091).
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/1213
dc.description.abstract	The authorization to access specific information is given by a biometric system.Biometric systems are used for security purposes in a way that they prevent unauthorized access to important information or data (information privacy). The accessgranted by the biometric is done by capturing traits of humans, which make allhuman beings unique w.r.t. that particular trait. This thesis focuses on voicebased biometric systems, also known as Automatic Speaker Verification (ASV)systems, given that speech is the most natural and powerful form of communication used by humans to communicate with the outside world. It is the most intuitive, simple, and easy-to-produce characteristic. Since ASV systems have beenused for applications, such as in banking transactions and access to buildings associated with classified information, only authorized legitimate or genuine usersare granted access.ASV systems suffer from vulnerabilities to attacks and can be compromisedat various stages. The attacks may be categorized as direct and indirect attacks,depending on the extent of the attacker�s accessibility to the ASV framework. Besides, due to the recent commercial success of several Intelligent Personal Assistants (IPAs), also known as voice assistants, such as Speech Interpretation andRecognition Interface (SIRI), Amazon Alexa, Google Home, and so on, manyvoice-enabled devices in Internet of Things (IoT) have been commonly prone tospoofing attacks. To that effect, there is active research in the direction of designing countermeasure systems for ASV systems, particularly for spoofing attacks,namely, Speech Synthesis (SS), Voice Conversion (VC), and replay.This thesis is a humble attempt to alleviate some of the research gaps in designing features for countermeasure systems. In particular, this thesis proposesQuadrature Energy Separation Algorithm (QESA) in the light of incorporating thequadrature-phase component with the in-phase component of the signal. To thateffect, an existing feature set for replay Spoofed Speech Detection (SSD), namely,CFCCIF-ESA is extended to the CFCCIF-QESA feature set for enhanced performance of the countermeasure system. The performance of the proposed CFCCIFQESA feature set is evaluated on various datasets for various spoofing attacksgiven in the literature. Furthermore, the existing Linear Frequency Residual Cepstral Coefficients (LFRCC) feature set is optimized w.r.t. to its Linear Prediction(LP) order for the replay SSD task. In particular, it is found that the LP orderneeded for a good prediction of speech is not the same as that needed for thereplay SSD task. The resulting optimized LFRCC feature set is evaluated on theASVSpoof 2019 PA dataset. In addition to this, another feature, known as the uncertainty vector (u-vector), is developed from the Heisenberg�s uncertainty principle in the signal processing framework. The proposed u-vector is evaluated usingthe ASVSpoof 2017 dataset for replay attacks.Furthermore, in the direction to make countermeasure systems independent ofthe type of spoofing attack, features have been proposed for the Voice LivenessDetection (VLD) task. VLD is performed by the detection of pop noise which is thediscriminating acoustic cue present in live speech, produced due to the breathingeffect captured by the microphone when the speaker�s mouth is close to the microphone. The work on VLD in this thesis is based on two key hypotheses, namely,Parseval�s energy equivalence for STFT, CWT, and analytic CWT, whereas the second hypothesis is that the energy of pop noise decreases with the distance of a microphone from the speaker that is used to capture genuine speech. The proposedfeatures for VLD in this thesis are wavelet-based, wherein three wavelets are used,namely, Bump, Morlet, and Morse wavelet, where Morse wavelet is presented as asuperfamily of analytic wavelets, called as Generalized Morse Wavelets (GMWs).Detailed experimental analysis such as speaker-microphone proximity, the effectof phoneme type, and the effect of frequency range is studied.Apart from this, the security of speech data is also taken into account and thisthesis proposes an improved Voice Privacy (VP) system, which is based on Linear Prediction (LP) of speech. Furthermore, the VP system is studied along withthe attacker�s perspective using the target selection approach, and particularly,target selection w.r.t. twins is studied, wherein the most vulnerable twin-pair(i.e., target) is selected. Lastly, some of the proposed feature sets in this thesis arealso evaluated for tasks related to other Assistive Speech Technologies (AST) applications, such as the classification of healthy vs. pathological infant cries, anddysarthric severity-level classification.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Voice Liveness Detection(VLD)
dc.subject	Gaussian Mixture Model(GMM)
dc.subject	Voice Privacy
dc.subject	Infant Cry Classification
dc.subject	Morse Wavelet
dc.subject	Deepfake detectors
dc.subject	Teager energy operator
dc.classification.ddc	006.454 GUP
dc.title	Feature for Live and Spoofed Speech Detection
dc.type	Thesis
dc.degree	PhD
dc.student.id	201721001
dc.accession.number	T01091

Files in this item

Name:: 201721001.pdf
Size:: 14.99Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

PhD Theses [87]

Show simple item record