Show simple item record

dc.contributor.advisorPatil, Hemant A.
dc.contributor.authorUthiraa, S.
dc.date.accessioned2024-08-22T05:21:25Z
dc.date.available2024-08-22T05:21:25Z
dc.date.issued2023
dc.identifier.citationUthiraa, S. (2023). Features for Speech Emotion Recognition. Dhirubhai Ambani Institute of Information and Communication Technology. xiii, 109 p. (Acc. # T01139).
dc.identifier.urihttp://drsr.daiict.ac.in//handle/123456789/1198
dc.description.abstractThe easiest and most effective or natural way of communication is through speech;the emotional aspect of speech leads to effective interpersonal communication.As technological advancements continue to proliferate, the dependence of humanson machines is also increasing, thereby making it imperative to establish efficientmethods for Speech Emotion Recognition (SER) to ensure effective humanmachineinteraction. This thesis focuses on understanding acoustic characteristicsof various emotions and their dependence on the culture and languageused. It then proposes a new feature set, namely, Constant Q Pitch Coefficients(CQPC) and Constant Q Harmonic Coefficients (CQHC) from Constant Q Transform,which captures high resolution pitch and harmonic information, respectively.Further, this thesis focuses on less explored excitation source-based featuresand proposes a novel Linear Frequency Residual Cepstral Coefficients (LFRCC)feature set for the same. Phase-based features, namely Modified Group DelayCepstral Coefficients (MGDCC), is proposed to capture vocal tract and vocal foldinformation well for emotion classification. The recently developed AutomaticSpeech Recognition (ASR) model, Whisper, is used to analyze cross-database SER.This thesis extends the LFRCC idea on the infant cry classification problem. Lastly,a local API is developed for SER.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectSpeech Emotion Recognition
dc.subjectConstant Q Pitch Coefficients
dc.subjectConstant Q Harmonic Coefficients
dc.subjectLinear Frequency Residual Cepstral Coefficients
dc.subjectModified Group Delay Cepstral Coefficients
dc.subjectWhisper
dc.subjectGMM
dc.subjectCNN
dc.subjectResNet
dc.subjectTDNN
dc.classification.ddc006.454 UTH
dc.titleFeatures for Speech Emotion Recognition
dc.typeDissertation
dc.degreeM. Tech
dc.student.id202111065
dc.accession.numberT01139


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record