Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/1198
Full metadata record
DC FieldValueLanguage
dc.contributor.advisorPatil, Hemant A.-
dc.contributor.authorUthiraa, S.-
dc.date.accessioned2024-08-22T05:21:25Z-
dc.date.available2024-08-22T05:21:25Z-
dc.date.issued2023-
dc.identifier.citationUthiraa, S. (2023). Features for Speech Emotion Recognition. Dhirubhai Ambani Institute of Information and Communication Technology. xiii, 109 p. (Acc. # T01139).-
dc.identifier.urihttp://drsr.daiict.ac.in//handle/123456789/1198-
dc.description.abstractThe easiest and most effective or natural way of communication is through speech;the emotional aspect of speech leads to effective interpersonal communication.As technological advancements continue to proliferate, the dependence of humanson machines is also increasing, thereby making it imperative to establish efficientmethods for Speech Emotion Recognition (SER) to ensure effective humanmachineinteraction. This thesis focuses on understanding acoustic characteristicsof various emotions and their dependence on the culture and languageused. It then proposes a new feature set, namely, Constant Q Pitch Coefficients(CQPC) and Constant Q Harmonic Coefficients (CQHC) from Constant Q Transform,which captures high resolution pitch and harmonic information, respectively.Further, this thesis focuses on less explored excitation source-based featuresand proposes a novel Linear Frequency Residual Cepstral Coefficients (LFRCC)feature set for the same. Phase-based features, namely Modified Group DelayCepstral Coefficients (MGDCC), is proposed to capture vocal tract and vocal foldinformation well for emotion classification. The recently developed AutomaticSpeech Recognition (ASR) model, Whisper, is used to analyze cross-database SER.This thesis extends the LFRCC idea on the infant cry classification problem. Lastly,a local API is developed for SER.-
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology-
dc.subjectSpeech Emotion Recognition-
dc.subjectConstant Q Pitch Coefficients-
dc.subjectConstant Q Harmonic Coefficients-
dc.subjectLinear Frequency Residual Cepstral Coefficients-
dc.subjectModified Group Delay Cepstral Coefficients-
dc.subjectWhisper-
dc.subjectGMM-
dc.subjectCNN-
dc.subjectResNet-
dc.subjectTDNN-
dc.classification.ddc006.454 UTH-
dc.titleFeatures for Speech Emotion Recognition-
dc.typeDissertation-
dc.degreeM. Tech-
dc.student.id202111065-
dc.accession.numberT01139-
Appears in Collections:M Tech Dissertations

Files in This Item:
File SizeFormat 
202111065.pdf12.67 MBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.