On Significance of Constant-Q Transform for Pop Noise Detection

Khoria, Kuldeep; Patil, Ankur T; Patil, Hemant

Publication:
On Significance of Constant-Q Transform for Pop Noise Detection

dc.contributor.affiliation	DA-IICT, Gandhinagar
dc.contributor.author	Khoria, Kuldeep
dc.contributor.author	Patil, Ankur T
dc.contributor.author	Patil, Hemant
dc.contributor.researcher	Khoria, Kuldeep (201911014)
dc.contributor.researcher	Patil, Ankur T (201621008)
dc.date.accessioned	2025-08-01T13:09:02Z
dc.date.issued	11-06-2023
dc.description.abstract	Liveness detection has emerged as an important research issue for many�biometrics, such as face, iris, hand geometry, etc. and significant research efforts are reported in the literature. However, less emphasis is given to liveness detection for voice biometrics or Automatic Speaker Verification (ASV). Voice Liveness Detection (VLD) can be a potential technique to detect spoofing attacks in�ASV system. Presence of pop noise in the speech signal of live speaker provides the discriminative acoustic cue to distinguish between genuine�vs.�spoofed speech in the framework of VLD. Pop noise comes out as a burst at the lips, which is captured by the ASV system (since the speaker and microphone are close enough), indicating the liveness of the speaker and provides the basis of VLD. In this paper, we present the Constant-Q Transform (CQT) -based approach over the traditional Short-Time Fourier Transform (STFT) -based algorithm (baseline). With respect to Heisenberg�s uncertainty principle in signal processing framework, the CQT has variable spectro-temporal resolution, in particular, better frequency resolution for low frequency region and better temporal resolution for high frequency region, which can be effectively utilized to identify the low frequency characteristics of pop noise. We have also compared proposed algorithm with�cepstral�features, namely, Linear Frequency�Cepstral Coefficients�(LFCC) and Constant-Q Cepstral Coefficients. The experiments are performed on recently released�POp noise COrpus�(POCO) dataset with various statistical, discriminative, and deep learning-based classifiers, namely,�Gaussian Mixture Model�(GMM),�Support Vector Machine�(SVM),�Convolutional Neural Networks�(CNN), Light-CNN (LCNN), and�Residual Network�(ResNet), respectively. The significant improvement in performance, in particular, an absolute improvement of 14.23% and 10.95% in terms of percentage�classification accuracy�on development and evaluation set, respectively, is obtained for the proposed CQT-based algorithm along with SVM classifier, over the STFT-SVM (baseline) system. Similar trend of the�performance improvement�is observed for the GMM, CNN, LCNN, and ResNet classifiers for the proposed CQT-based algorithm�vs.�traditional STFT-based algorithm. The analysis is further extended by simulating the replay mechanism (in the standard framework of ASVSpoof-2019 PA challenge dataset) on the subset of POCO dataset in order to observe the effect of room acoustics onto the performance of the VLD system. By embedding the moderate simulated replay mechanism in POCO dataset, we obtained the percentage�classification accuracy�of 97.82% on evaluation set.
dc.format.extent	1-26.
dc.identifier.citation	Kuldeep Khoria, Ankur T. Patil and Patil, Hemant A, "On Significance of Constant-Q Transform for Pop Noise Detection," Computer, Speech and Language, Elsevier, ISSN 0885-2308, vol. 77, Jan. 2023, article no. 101421, pp. 1-26, doi: 10.1016/j.csl.2022.101421. [Published Date: 11 Jun. 2022]
dc.identifier.doi	10.1016/j.csl.2022.101421
dc.identifier.issn	1095-8363
dc.identifier.scopus	2-s2.0-85133964664
dc.identifier.uri	https://ir.daiict.ac.in/handle/dau.ir/1563
dc.identifier.wos	WOS:000822549700002
dc.language.iso	en
dc.publisher	Elsevier
dc.relation.ispartofseries	Vol. 77; No.
dc.source	Computer, Speech and Language
dc.source.uri	https://www.sciencedirect.com/science/article/pii/S0885230822000547?via%3Dihub
dc.title	On Significance of Constant-Q Transform for Pop Noise Detection
dspace.entity.type	Publication
relation.isAuthorOfPublication	fdb7041b-280e-498b-b2ee-34f9bc351f4c
relation.isAuthorOfPublication.latestForDiscovery	fdb7041b-280e-498b-b2ee-34f9bc351f4c

Collections

Journal Article

Publication: On Significance of Constant-Q Transform for Pop Noise Detection

Files

Collections

Publication:
On Significance of Constant-Q Transform for Pop Noise Detection