Journal Article

Permanent URI for this collectionhttps://ir.daiict.ac.in/handle/123456789/37

Browse

Search Results

Now showing 1 - 3 of 3
  • Publication
    LP spectra vs. mel spectra for identification of professional mimics in Indian languages
    (Springer, 19-05-2009) Basu, T K; Patil, Hemant; DA-IICT, Gandhinagar
    Automatic Speaker Recognition (ASR) is an economic tool for voice biometrics because of availability of low cost and powerful processors. For an ASR system to be successful in practical environments, it must have�high mimic resistance, i.e., the system should not be defeated by determined mimics which may be either identical twins or professional mimics. In this paper, we demonstrate the effectiveness of Linear Prediction (LP)-based features, viz., Linear Prediction Coefficients (LPC) and Linear Prediction Cepstral Coefficients (LPCC) over filterbank-based features such as Mel-Frequency Cepstral Coefficients (MFCC) and newly proposed Teager energy-based MFCC (T-MFCC) for the identification of professional mimics in Indian languages. Results are reported for real and fictitious experiments. On the whole, it is observed that LP-based features perform�better�than filterbank-based features (an average jump of 23.21% and 31.43% for fictitious experiments with professional mimic in Marathi and Hindi, respectively, whereas there is an average jump of 1.64% for real experiments with professional mimic in Hindi) and�we believe that this is the first time such results on identification of professional mimics in ASR are obtained. Analysis of the results is given with the help of Mean Square Error (MSE) between training and testing utterances for mimic�s imitations for target speakers and target speakers� normal voice. Fourier spectra and corresponding LP spectra for target speaker and its impersonations provided by professional mimic are shown to justify the results. Finally, dependence of LPC on physiological characteristics of vocal tract and its relation with respect to the problem addressed in this paper is studied.
  • Publication
    Development of speech corpora for speaker recognition research and evaluation in Indian languages
    (Springer, 19-05-2009) Basu, T K; Patil, Hemant; DA-IICT, Gandhinagar
    Automatic Speaker Recognition (ASR) refers to the task of identifying a person based on his or her voice with the help of machines. ASR finds its potential applications in telephone based financial transactions, purchase of credit card and in forensic science and social anthropology for the study of different cultures and languages. Results of ASR are highly dependent on database, i.e., the results obtained in ASR are meaningless if recording conditions are not known. In this paper, a methodology and a typical experimental setup used for development of corpora for various tasks in the text-independent speaker identification in different Indian languages, viz., Marathi, Hindi, Urdu and Oriya have been described. Finally, an ASR system is presented to evaluate the corpora.
  • Publication
    Identifying perceptually similar languages using teager energy based cepstrum
    (Engineering, 19-08-2008) Basu, T K; Patil, Hemant; DA-IICT, Gandhinagar
    Language Identification (LID) refers to the task of identifying an unknown language from the test utterances. In this paper, a new feature set,�viz.,T-MFCC by amalgamating Teager Energy Operator (TEO) and well-known Mel frequency cepstral coefficients (MFCC) is developed. The effectiveness of the newly derived feature set is demonstrated for identifying perceptually similar Indian languages such as Hindi and Urdu. The modified structure of polynomial classifier of 2nd�and 3rd�order approximation has been used for the LID problem. The results have been compared with state-of-the art feature set,�viz.,MFCC and found to be effective (an average jump 21.66%) in majority of the cases. This may be due to the fact that the T-MFCC represents the combined effect of airflow properties in the vocal tract (which are known to be language and speaker dependent) and human perception process for hearing.