Acoustic source localization using audio and video sensors

Mouli, P. Chandra

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/73

Title:	Acoustic source localization using audio and video sensors
Authors:	Chakka, Vijaykumar Mouli, P. Chandra
Keywords:	Acoustical communication Acoustical engineering Acoustical source Acoustic source localization Acoustic source localization Audio sensors Sensors Video sensors
Issue Date:	2005
Publisher:	Dhirubhai Ambani Institute of Information and Communication Technology
Citation:	Mouli, P. Chandra (2005). Acoustic source localization using audio and video sensors. Dhirubhai Ambani Institute of Information and Communication Technology, ix, 45 p. (Acc.No: T00036)
Abstract:	Problem of localizing an acoustic source using microphone array and video camera is studied. In general, it’s quite obvious that a human with both, eyes and ears can make out things more accurately than to a human who is either blind or deaf. We have taken a position that a machine like man can do better in localizing an object if relies both on audio and video sensors. For localization using microphone array, Time Delay Estimate (TDE) based localization schemes are used. The TDE based localizers are fast enough to give the results in real time. Generalized Cross Correlation along with Phase Transform is used for finding the time delay of arrival at a microphone array. With these time delay of arrivals and known array geometry, spherical equations are written and these equations are solved using Least squares estimation to get the position of the source. To get the video cue, we tried to localize the human face/body in a given image/video. A clustering property of human skin in YCbCr color space is exploited to do this task. A skin color model is built from a large set of image-database to segment out the human skin from the image. The database is collected so that the same model works for different colors of skin (white, black and yellow). Nearly one crore pixels (twenty five lacks for skin pixel and seventy of five lacks for non-skin pixels) of twenty different people under different illumination conditions are considered for modelling of skin color and non-skin color histograms. Once the face location in the image is found out, a lookup table method is discussed using which one can convert the given pixel number to a location in the room with respect to camera coordinates for fixed distances. Now both the audio and video estimates are fused together to give a better estimate. It is shown in this thesis that, taking the video cues along with the audio cues improves the estimate. The developed localizer can give two estimates of the source in one second.
URI:	http://drsr.daiict.ac.in/handle/123456789/73
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
200311009.pdf Restricted Access		4.09 MB	Adobe PDF	View/Open Request a copy

Show full item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets