Analysis of nonlinearity in speech production mechanism for speaker verification: phase-based approach
Abstract
Many of the real-world signal processing problems can be described using linear models, and can be realized as analog or digital filter, time-invariant filters; finite or infinite impulse response (IIR or FIR) filters. In the recent past, a nonlinear operator called Teager Energy Operator (TEO) has been introduced and investigated as it has a small window in temporal-domain, making it ideal for local time analysis of signals.
This thesis aims to explore the nonlinear nature of the speech production mechanism of a speaker. There has been significant advancement in exploring the source and system-based features for speaker recognition attributed to the characteristics of the excitation source and size and shape of the vocal tract. In this work, TEO phase features are derived from fullband speech signal and then on subband speech signal (due to the property of the TEO being a monocomponent operator). In addition, a feature set is derived from residual phase extracted from nonlinear filter designed using Volterra-Weiner (VW) series exploiting higher-order linear as well as nonlinear relationships hidden in the sequence of samples of speech signal.
Experiments have been performed on the score-level fusion of the proposed feature sets with state-of-the-art MFCC features for text-independent Speaker Verification (SV) task, based on Gaussian Mixture Model-Universal Background Model (GMM-UBM) system, respectively. The performance of each feature set is evaluated and a comparative study of each of the features is presented. The results obtained provide an evaluation of the nature of the speech production mechanism and provides features to improve performance of SV system.
Collections
- M Tech Dissertations [923]