Combining evidences from magnitude and phase information using VTEO for person recognition using humming

Madhavi, Maulik C; Patil, Hemant

Publication:
Combining evidences from magnitude and phase information using VTEO for person recognition using humming

dc.contributor.affiliation	DA-IICT, Gandhinagar
dc.contributor.author	Madhavi, Maulik C
dc.contributor.author	Patil, Hemant
dc.contributor.researcher	Madhavi, Maulik C (200911036)
dc.date.accessioned	2025-08-01T13:09:01Z
dc.date.issued	01-11-2018
dc.description.abstract	Most of the state-of-the-art�speaker recognition system�use natural speech signal (i.e., real speech, spontaneous speech or contextual speech) from the subjects. In this paper, recognition of a person is attempted from his or her�hum�with the help of machines. This kind of application can be useful to design person-dependent Query-by-Humming (QBH) system and hence, plays an important role in�music information retrieval�(MIR) system. In addition, it can be also useful for other interesting speech technological applications such as human-computer interaction, speech prosody analysis of disordered speech, and speaker forensics. This paper develops new feature extraction technique to exploit�perceptually�meaningful (due to mel frequency warping to imitate human perception process for hearing) phase spectrum information along with magnitude spectrum information from the hum signal. In particular, the structure of state-of-the-art feature set, namely,�Mel Frequency Cepstral Coefficients�(MFCCs) is modified to capture the phase spectrum information. In addition, a new�energy measure, namely,�Variable length�Teager Energy Operator (VTEO) is employed to compute subband energies of different time-domain�subband signals�(i.e., an output of�24�triangular-shaped filters used in the mel filterbank). We refer this proposed feature set as MFCC-VTMP (i.e., mel frequency cepstral coefficients to capture perceptually meaningful magnitude and phase information via VTEO)The polynomial classifier (which is in-principle similar to other discriminatively-trained classifiers such as�support vector machine�(SVM) with polynomial kernel) is used as the basis for all the experiments. The effectiveness of proposed feature set is evaluated and consistently found to be better than MFCCs feature set for several evaluation factors, such as, comparison with other phase-based features, the order of polynomial classifier, person (speaker) modeling approach (such as, GMM-UBM and�i-vector), the dimension of feature vector, robustness under signal degradation conditions, static�vs.�dynamic features, feature discrimination measures and intersession variability.
dc.format.extent	225-256
dc.identifier.citation	Patil, Hemant A, and Maulik C. Madhavi, "Combining evidences from magnitude and phase information using VTEO for person recognition using humming," Computer Speech & Language., Vol. 52 , Nov. 2018, pp. 225-256. doi: 10.1016/j.csl.2017.06.009.
dc.identifier.doi	10.1016/j.csl.2017.06.009
dc.identifier.issn	1095-8363
dc.identifier.scopus	2-s2.0-85030319887
dc.identifier.uri	https://ir.daiict.ac.in/handle/dau.ir/1545
dc.identifier.wos	WOS:000440789000012
dc.language.iso	en
dc.publisher	Elsevier
dc.relation.ispartofseries	Vol. 52; No.
dc.source	Computer Speech & Language.
dc.source.uri	https://www.sciencedirect.com/science/article/pii/S0885230816303102?via%3Dihub
dc.title	Combining evidences from magnitude and phase information using VTEO for person recognition using humming
dspace.entity.type	Publication
relation.isAuthorOfPublication	fdb7041b-280e-498b-b2ee-34f9bc351f4c
relation.isAuthorOfPublication.latestForDiscovery	fdb7041b-280e-498b-b2ee-34f9bc351f4c

Collections

Journal Article

Publication: Combining evidences from magnitude and phase information using VTEO for person recognition using humming

Files

Collections

Publication:
Combining evidences from magnitude and phase information using VTEO for person recognition using humming