Acoustic source localization using audio and video sensors

Mouli, P. Chandra

dc.contributor.advisor	Chakka, Vijaykumar
dc.contributor.author	Mouli, P. Chandra
dc.date.accessioned	2017-06-10T14:36:54Z
dc.date.available	2017-06-10T14:36:54Z
dc.date.issued	2005
dc.identifier.citation	Mouli, P. Chandra (2005). Acoustic source localization using audio and video sensors. Dhirubhai Ambani Institute of Information and Communication Technology, ix, 45 p. (Acc.No: T00036)
dc.identifier.uri	http://drsr.daiict.ac.in/handle/123456789/73
dc.description.abstract	Problem of localizing an acoustic source using microphone array and video camera is studied. In general, it’s quite obvious that a human with both, eyes and ears can make out things more accurately than to a human who is either blind or deaf. We have taken a position that a machine like man can do better in localizing an object if relies both on audio and video sensors. For localization using microphone array, Time Delay Estimate (TDE) based localization schemes are used. The TDE based localizers are fast enough to give the results in real time. Generalized Cross Correlation along with Phase Transform is used for finding the time delay of arrival at a microphone array. With these time delay of arrivals and known array geometry, spherical equations are written and these equations are solved using Least squares estimation to get the position of the source. To get the video cue, we tried to localize the human face/body in a given image/video. A clustering property of human skin in YCbCr color space is exploited to do this task. A skin color model is built from a large set of image-database to segment out the human skin from the image. The database is collected so that the same model works for different colors of skin (white, black and yellow). Nearly one crore pixels (twenty five lacks for skin pixel and seventy of five lacks for non-skin pixels) of twenty different people under different illumination conditions are considered for modelling of skin color and non-skin color histograms. Once the face location in the image is found out, a lookup table method is discussed using which one can convert the given pixel number to a location in the room with respect to camera coordinates for fixed distances. Now both the audio and video estimates are fused together to give a better estimate. It is shown in this thesis that, taking the video cues along with the audio cues improves the estimate. The developed localizer can give two estimates of the source in one second.
dc.publisher	Dhirubhai Ambani Institute of Information and Communication Technology
dc.subject	Acoustical communication
dc.subject	Acoustical engineering
dc.subject	Acoustical source
dc.subject	Acoustic source localization
dc.subject	Acoustic source localization
dc.subject	Audio sensors
dc.subject	Sensors
dc.subject	Video sensors
dc.classification.ddc	621.3828 MOU
dc.title	Acoustic source localization using audio and video sensors
dc.type	Dissertation
dc.degree	M. Tech
dc.student.id	200311009
dc.accession.number	T00036

Files in this item

Name:: 200311009.pdf
Size:: 3.997Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

M Tech Dissertations [923]

Show simple item record

Acoustic source localization using audio and video sensors

Files in this item

This item appears in the following Collection(s)

Related items

Acoustic-to-articulatory inversion: speech quality assessment and smoothness constraint ﻿

Acoustic analysis of musical pillars of vitthala temple, Hampi ﻿

Person recognition from their hum ﻿

Acoustic-to-articulatory inversion: speech quality assessment and smoothness constraint

Acoustic analysis of musical pillars of vitthala temple, Hampi

Person recognition from their hum