Data Augmentation Using CycleGAN for Children’s ASR43e

Singh, Dipesh Kumar

Please use this identifier to cite or link to this item: http://drsr.daiict.ac.in//handle/123456789/1065

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	Patil, Hemant A.
dc.contributor.advisor	Sailor, Hardik B.
dc.contributor.author	Singh, Dipesh Kumar
dc.date.accessioned	2022-05-06T14:15:22Z
dc.date.available	2023-02-27T14:15:22Z
dc.date.issued	2021
dc.identifier.citation	Singh, Dipesh Kumar (2021). Data Augmentation Using CycleGAN for Children’s ASR43e. Dhirubhai Ambani Institute of Information and Communication Technology. xiii, 78 p. (Acc.No: T00982)
dc.identifier.uri	http://drsr.daiict.ac.in//handle/123456789/1065
dc.description.abstract	Extensive use of voice assistants by children in their day-to-day life activities asks for better performance of Automatic Speech Recognition (ASR) for children’speech. The recent advancements in ASR perform better for adult speech. However, due to acoustic mismatch (in particular, higher pitch frequency and thus, the poor spectral resolution and less availability of children data), it remains a challenge to improve the performance of children’s ASR. Due to less availability of children’s speech data to train the deep neural network, data augmentation is one of the key research areas for children’s ASR. This thesis explores well known data augmentation approaches from the literature, i.e., audio (speed and tempo) perturbation and SpecAugment methods. In the thesis, the voice conversion-based data augmentation technique using a Cycleconsistent Generative Adversarial Network (CycleGAN) is proposed for hybrid DNN-HMM and end-to-end (E2E) ASR systems. Here, CycleGAN training is exploited for children-to-children voice conversion for hybrid DNN-HMM ASR and adult-to-children voice conversion for E2E ASR systems. The performance comparison with and without data augmentation is presented for different augmentation strategies. ASR experiments were performed using the children ASR corpora released in INTERSPEECH challenges. The effect of using out-of-domain data for data augmentation is observed, in particular, for male-to-children class and female-to-children class voice conversion. Both the approaches performed well with a significant reduction in word error rate (WER) of the children’s ASR system. Another application of the proposed CycleGAN architecture is investigated in the voice privacy system, where male-to-female and female-to-male class mapping is obtained to modify the speaker-specific information. Thus, providing a good anonymization method.
dc.subject	Data Augmentation
dc.subject	CycleGAN
dc.subject	Automatic Speech Recognition
dc.subject	Perturbation
dc.subject	SpecAugment
dc.subject	Cycleconsistent
dc.classification.ddc	006.32 SIN
dc.title	Data Augmentation Using CycleGAN for Children’s ASR43e
dc.type	Dissertation
dc.degree	M. Tech
dc.accession.number	T00982
Appears in Collections:	M Tech Dissertations

Files in This Item:

File	Description	Size	Format
201911057_Dipesh_Singh - hemant patil.pdf Restricted Access		10.09 MB	Adobe PDF	View/Open Request a copy

Show simple item record

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets