Significance of Teager Energy Operator for Speech Applications

Therattil, Anand Saju

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/1137

Title:	Significance of Teager Energy Operator for Speech Applications
Authors:	Patil, Hemant A. Therattil, Anand Saju
Keywords:	Replay Spoof Speech Teager Energy Cepstral Coefficients Cross-Teager Energy Cepstral Coefficients Squared Energy Cepstral Coefficients Dysarthria
Issue Date:	2022
Publisher:	Dhirubhai Ambani Institute of Information and Communication Technology
Citation:	Therattil, Anand Saju (2022). Significance of Teager Energy Operator for Speech Applications. Dhirubhai Ambani Institute of Information and Communication Technology. xii, 70 p. (Acc. # T01057).
Abstract:	Speech is used in various applications apart from voice communications, such as pathology detection, severity-level classification of dysarthria, and replay spoof speech detection for voice biometric and voice assistants. The first part of this thesis work deals with the development of the countermeasure (CM) system for replay Spoof Speech Detection (SSD). Replay attack on voice biometric, refers to the fraudulent attempt made by an imposter to spoof another person�s identity by replaying the pre-recorded voice samples in front of an Automatic Speaker Veri- fication (ASV) system or Voice Assistants (VAs). Lastly, the dysarthria, which is neuromotor speech disorder is studied and analysed using various speech processing and deep learning approaches. Dysarthria, Parkinson�s disease, Cerebral Palsy, etc. are types of atypical speech, which impairs neuromotor functions of the human body. Among these, dysarthria is one of the most common atypical speech. To analyse the dysarthic condition of the patient depends on the severity level, which is generally provided by Speech Language Pathologist (SLPs). However, to make the assessment immune to human biases and errors, this thesis is oriented towards developing the severity level classification system using signal processing and deep learning approaches for dysarthric speech. This presents analysis of dysarthic vs. normal speech using the Teager Energy Operator (TEO) based Teager Energy Cepstral Coefficients (TECC), and Squared Energy Operator (SEO) based Squared Energy Cepstral Co-efficients (SECC) as the frontend features. These features provided as input for deep learning and pattern recognition model predicts the severitylevel class for dysarthria. Lastly, the generalization of the countermeasure system for the replay attacks on the ASV systems and VAs is analysed using the TEO based TECC feature set. The generalization of the CM system is presented through the cross database evaluation between the Voice Spoofing Detection Corpus (VSDC), ASVspoof 2017 version 2.0 and ASVspoof 2019 PA datasets. Further, the analysis of One point Replay (1PR) and Two Point Replay (2PR) are presented in this thesis.
URI:	http://drsr.daiict.ac.in//handle/123456789/1137
Appears in Collections:	M Tech (EC) Dissertations

Files in This Item:

File	Size	Format
202015005.pdf	3.04 MB	Adobe PDF	View/Open

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets