Acoustic source localization using audio and video sensors

Mouli, P. Chandra

View/Open

200311009.pdf (3.997Mb)

Date

2005

Author

Mouli, P. Chandra

Metadata

Show full item record

Abstract

Problem of localizing an acoustic source using microphone array and video camera is studied. In general, it’s quite obvious that a human with both, eyes and ears can make out things more accurately than to a human who is either blind or deaf. We have taken a position that a machine like man can do better in localizing an object if relies both on audio and video sensors. For localization using microphone array, Time Delay Estimate (TDE) based localization schemes are used. The TDE based localizers are fast enough to give the results in real time. Generalized Cross Correlation along with Phase Transform is used for finding the time delay of arrival at a microphone array. With these time delay of arrivals and known array geometry, spherical equations are written and these equations are solved using Least squares estimation to get the position of the source. To get the video cue, we tried to localize the human face/body in a given image/video. A clustering property of human skin in YCbCr color space is exploited to do this task. A skin color model is built from a large set of image-database to segment out the human skin from the image. The database is collected so that the same model works for different colors of skin (white, black and yellow). Nearly one crore pixels (twenty five lacks for skin pixel and seventy of five lacks for non-skin pixels) of twenty different people under different illumination conditions are considered for modelling of skin color and non-skin color histograms. Once the face location in the image is found out, a lookup table method is discussed using which one can convert the given pixel number to a location in the room with respect to camera coordinates for fixed distances. Now both the audio and video estimates are fused together to give a better estimate. It is shown in this thesis that, taking the video cues along with the audio cues improves the estimate. The developed localizer can give two estimates of the source in one second.

URI

http://drsr.daiict.ac.in/handle/123456789/73

Collections

M Tech Dissertations [923]