Acoustic source localization using audio and video sensors
Problem of localizing an acoustic source using microphone array and video camera is studied. In general, it’s quite obvious that a human with both, eyes and ears can make out things more accurately than to a human who is either blind or deaf. We have taken a position that a machine like man can do better in localizing an object if relies both on audio and video sensors. For localization using microphone array, Time Delay Estimate (TDE) based localization schemes are used. The TDE based localizers are fast enough to give the results in real time. Generalized Cross Correlation along with Phase Transform is used for finding the time delay of arrival at a microphone array. With these time delay of arrivals and known array geometry, spherical equations are written and these equations are solved using Least squares estimation to get the position of the source. To get the video cue, we tried to localize the human face/body in a given image/video. A clustering property of human skin in YCbCr color space is exploited to do this task. A skin color model is built from a large set of image-database to segment out the human skin from the image. The database is collected so that the same model works for different colors of skin (white, black and yellow). Nearly one crore pixels (twenty five lacks for skin pixel and seventy of five lacks for non-skin pixels) of twenty different people under different illumination conditions are considered for modelling of skin color and non-skin color histograms. Once the face location in the image is found out, a lookup table method is discussed using which one can convert the given pixel number to a location in the room with respect to camera coordinates for fixed distances. Now both the audio and video estimates are fused together to give a better estimate. It is shown in this thesis that, taking the video cues along with the audio cues improves the estimate. The developed localizer can give two estimates of the source in one second.
- M Tech Dissertations 
Showing items related by title, author, creator and subject.
Rajpal, Avni (Dhirubhai Ambani Institute of Information and Communication Technology, 2015)The ability of humans to speak effortlessly, require coordinated movements of various articulators, muscles, etc. This effortless movement contributes towards naturalness, intelligibility and speaker identity in human ...
Lakshmipriya, V. K. (Dhirubhai Ambani Institute of Information and Communication Technology, 2014)This thesis is a systematic investigation on the acoustics of musical pillars of Vitthala temple at Hampi, India. The columns of different pillars produce sounds of different musical instruments (in particular, instruments ...
Madhavi, Maulik C. (Dhirubhai Ambani Institute of Information and Communication Technology, 2011)In this thesis, design of person recognition system based on person's hum is presented. As hum is nasalized sound and LP (Linear Predication) model does not characterize nasal sounds sufficiently, our approach in this work ...