Publication:
On Significance of Constant-Q Transform for Pop Noise Detection

dc.contributor.affiliationDA-IICT, Gandhinagar
dc.contributor.authorKhoria, Kuldeep
dc.contributor.authorPatil, Ankur T
dc.contributor.authorPatil, Hemant
dc.contributor.researcherKhoria, Kuldeep (201911014)
dc.contributor.researcherPatil, Ankur T (201621008)
dc.date.accessioned2025-08-01T13:09:02Z
dc.date.issued11-06-2023
dc.description.abstractLiveness detection has emerged as an important research issue for many�biometrics, such as face, iris, hand geometry, etc. and significant research efforts are reported in the literature. However, less emphasis is given to liveness detection for voice biometrics or Automatic Speaker Verification (ASV). Voice Liveness Detection (VLD) can be a potential technique to detect spoofing attacks in�ASV system. Presence of pop noise in the speech signal of live speaker provides the discriminative acoustic cue to distinguish between genuine�vs.�spoofed speech in the framework of VLD. Pop noise comes out as a burst at the lips, which is captured by the ASV system (since the speaker and microphone are close enough), indicating the liveness of the speaker and provides the basis of VLD. In this paper, we present the Constant-Q Transform (CQT) -based approach over the traditional Short-Time Fourier Transform (STFT) -based algorithm (baseline). With respect to Heisenberg�s uncertainty principle in signal processing framework, the CQT has variable spectro-temporal resolution, in particular, better frequency resolution for low frequency region and better temporal resolution for high frequency region, which can be effectively utilized to identify the low frequency characteristics of pop noise. We have also compared proposed algorithm with�cepstral�features, namely, Linear Frequency�Cepstral Coefficients�(LFCC) and Constant-Q Cepstral Coefficients. The experiments are performed on recently released�POp noise COrpus�(POCO) dataset with various statistical, discriminative, and deep learning-based classifiers, namely,�Gaussian Mixture Model�(GMM),�Support Vector Machine�(SVM),�Convolutional Neural Networks�(CNN), Light-CNN (LCNN), and�Residual Network�(ResNet), respectively. The significant improvement in performance, in particular, an absolute improvement of 14.23% and 10.95% in terms of percentage�classification accuracy�on development and evaluation set, respectively, is obtained for the proposed CQT-based algorithm along with SVM classifier, over the STFT-SVM (baseline) system. Similar trend of the�performance improvement�is observed for the GMM, CNN, LCNN, and ResNet classifiers for the proposed CQT-based algorithm�vs.�traditional STFT-based algorithm. The analysis is further extended by simulating the replay mechanism (in the standard framework of ASVSpoof-2019 PA challenge dataset) on the subset of POCO dataset in order to observe the effect of room acoustics onto the performance of the VLD system. By embedding the moderate simulated replay mechanism in POCO dataset, we obtained the percentage�classification accuracy�of 97.82% on evaluation set.
dc.format.extent1-26.
dc.identifier.citationKuldeep Khoria, Ankur T. Patil and Patil, Hemant A, "On Significance of Constant-Q Transform for Pop Noise Detection," Computer, Speech and Language, Elsevier, ISSN 0885-2308, vol. 77, Jan. 2023, article no. 101421, pp. 1-26, doi: 10.1016/j.csl.2022.101421. [Published Date: 11 Jun. 2022]
dc.identifier.doi10.1016/j.csl.2022.101421
dc.identifier.issn1095-8363
dc.identifier.scopus2-s2.0-85133964664
dc.identifier.urihttps://ir.daiict.ac.in/handle/dau.ir/1563
dc.identifier.wosWOS:000822549700002
dc.language.isoen
dc.publisherElsevier
dc.relation.ispartofseriesVol. 77; No.
dc.source Computer, Speech and Language
dc.source.urihttps://www.sciencedirect.com/science/article/pii/S0885230822000547?via%3Dihub
dc.titleOn Significance of Constant-Q Transform for Pop Noise Detection
dspace.entity.typePublication
relation.isAuthorOfPublicationfdb7041b-280e-498b-b2ee-34f9bc351f4c
relation.isAuthorOfPublication.latestForDiscoveryfdb7041b-280e-498b-b2ee-34f9bc351f4c

Files

Collections