Features for Speech Emotion Recognition

Uthiraa, S.

dc.contributor.advisor	Patil, Hemant A.
dc.contributor.author	Uthiraa, S.
dc.date.accessioned	2024-08-22T05:21:25Z
dc.date.available	2024-08-22T05:21:25Z
dc.date.issued	2023
dc.identifier.citation	Uthiraa, S. (2023). Features for Speech Emotion Recognition. Dhirubhai Ambani Institute of Information and Communication Technology. xiii, 109 p. (Acc. # T01139).
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/1198
dc.description.abstract	The easiest and most effective or natural way of communication is through speech;the emotional aspect of speech leads to effective interpersonal communication.As technological advancements continue to proliferate, the dependence of humanson machines is also increasing, thereby making it imperative to establish efficientmethods for Speech Emotion Recognition (SER) to ensure effective humanmachineinteraction. This thesis focuses on understanding acoustic characteristicsof various emotions and their dependence on the culture and languageused. It then proposes a new feature set, namely, Constant Q Pitch Coefficients(CQPC) and Constant Q Harmonic Coefficients (CQHC) from Constant Q Transform,which captures high resolution pitch and harmonic information, respectively.Further, this thesis focuses on less explored excitation source-based featuresand proposes a novel Linear Frequency Residual Cepstral Coefficients (LFRCC)feature set for the same. Phase-based features, namely Modified Group DelayCepstral Coefficients (MGDCC), is proposed to capture vocal tract and vocal foldinformation well for emotion classification. The recently developed AutomaticSpeech Recognition (ASR) model, Whisper, is used to analyze cross-database SER.This thesis extends the LFRCC idea on the infant cry classification problem. Lastly,a local API is developed for SER.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Speech Emotion Recognition
dc.subject	Constant Q Pitch Coefficients
dc.subject	Constant Q Harmonic Coefficients
dc.subject	Linear Frequency Residual Cepstral Coefficients
dc.subject	Modified Group Delay Cepstral Coefficients
dc.subject	Whisper
dc.subject	GMM
dc.subject	CNN
dc.subject	ResNet
dc.subject	TDNN
dc.classification.ddc	006.454 UTH
dc.title	Features for Speech Emotion Recognition
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	202111065
dc.accession.number	T01139

Files in this item

Name:: 202111065.pdf
Size:: 12.36Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record