Crying for a reason : a signal processing based approach for infant cry analysis and classification
The present work in this thesis is directed towards understanding the reason of crying of an infant using signal processing approaches. Infant cry analysis and classification is a non-invasive method of analyzing the infant cries and identifying the reason of crying such as pain, hunger, discomfort or presence of any disease. An instrument developed for this purpose may be helpful in bringing up of an infant and preventing the infant from distress because he or she cannot convey his or her needs to the caretakers and thus, improve the quality of life. For the development of any computer algorithm for the analysis and classification task, development of database is necessary. For infant cry analysis and classification task, if a database is to be created, several factors needs to be considered while designing of the corpus, such as, reason of cry,age of infant, method of cry generation, etc. Effect of factors influencing the system behavior developed for infant cry analysis is presented in this thesis.Along with this, ideal characteristics of the infant cry corpus are discussed in this work. For processing of the infant cry signal, various signal processing challenges associated while using state-of-the-art methods are illustrated in this work. Signal processing methods, namely, Short-time Fourier transform (STFT) analysis, linear prediction (LP) analysis, cepstrum analysis and Teagers energy operator (TEO) analysis are used in this thesis. Along with it, for different pathological cries (such as asthma, meningitis etc.)spectrographic analysis is shown. In this thesis, analysis of different infant cry types is performed using acoustic features such as fundamental frequency (F0), energy in different frequency bands, unvoicing percentage of cry segments in the cry and duration of cryunits. For understanding the significance of these features in the cry analysis, 1-way analysis of variance (ANOVA) is used. Infant cries of various pathological cases are also analyzed using these features and difference in the normal and pathological infant cries are observed and reported in this thesis. Classification of normal and pathological infant cries is also attempted in this work. For the classification of the normal and pathological infant cries,bispectrum-based features are used and classification accuracy of 81.64 % is obtained. Performance of bispectrum based features is found to be better than state-of-the-art MFCC features. Noise robustness of the proposed features is also shown. Classification of two pathologies, namely, asthma and hypoxy ischemic encephalopathy (HIE) are also reported in this thesis. For the classification of these two pathological cry signals, features such as glottal inverse filtering, modulation spectrogram, auditory spectrogram and group delay-based features are used. All these features are found to perform well in classifying these two pathologies. Classification of these pathologies from the normal healthy infants' cries is also attempted in this work. Though the performance of the proposed system is not very good, however, it can help in preventing the infants by giving alarm of presence of HIE disease which can result in motor and physical handicap, if left unattended.
- PhD Theses