Show simple item record

dc.contributor.advisorChakka, Vijaykumar
dc.contributor.authorMouli, P. Chandra
dc.date.accessioned2017-06-10T14:36:54Z
dc.date.available2017-06-10T14:36:54Z
dc.date.issued2005
dc.identifier.citationMouli, P. Chandra (2005). Acoustic source localization using audio and video sensors. Dhirubhai Ambani Institute of Information and Communication Technology, ix, 45 p. (Acc.No: T00036)
dc.identifier.urihttp://drsr.daiict.ac.in/handle/123456789/73
dc.description.abstractProblem of localizing an acoustic source using microphone array and video camera is studied. In general, it’s quite obvious that a human with both, eyes and ears can make out things more accurately than to a human who is either blind or deaf. We have taken a position that a machine like man can do better in localizing an object if relies both on audio and video sensors. For localization using microphone array, Time Delay Estimate (TDE) based localization schemes are used. The TDE based localizers are fast enough to give the results in real time. Generalized Cross Correlation along with Phase Transform is used for finding the time delay of arrival at a microphone array. With these time delay of arrivals and known array geometry, spherical equations are written and these equations are solved using Least squares estimation to get the position of the source. To get the video cue, we tried to localize the human face/body in a given image/video. A clustering property of human skin in YCbCr color space is exploited to do this task. A skin color model is built from a large set of image-database to segment out the human skin from the image. The database is collected so that the same model works for different colors of skin (white, black and yellow). Nearly one crore pixels (twenty five lacks for skin pixel and seventy of five lacks for non-skin pixels) of twenty different people under different illumination conditions are considered for modelling of skin color and non-skin color histograms. Once the face location in the image is found out, a lookup table method is discussed using which one can convert the given pixel number to a location in the room with respect to camera coordinates for fixed distances. Now both the audio and video estimates are fused together to give a better estimate. It is shown in this thesis that, taking the video cues along with the audio cues improves the estimate. The developed localizer can give two estimates of the source in one second.
dc.publisherDhirubhai Ambani Institute of Information and Communication Technology
dc.subjectAcoustical communication
dc.subjectAcoustical engineering
dc.subjectAcoustical source
dc.subjectAcoustic source localization
dc.subjectAcoustic source localization
dc.subjectAudio sensors
dc.subjectSensors
dc.subjectVideo sensors
dc.classification.ddc621.3828 MOU
dc.titleAcoustic source localization using audio and video sensors
dc.typeDissertation
dc.degreeM. Tech
dc.student.id200311009
dc.accession.numberT00036


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record