{"title":"Sensitivity Analysis of MaskCycleGAN based Voice Conversion for Enhancing Cleft Lip and Palate Speech Recognition","authors":"S. Bhattacharjee, R. Sinha","doi":"10.1109/SPCOM55316.2022.9840769","DOIUrl":null,"url":null,"abstract":"Cleft lip and palate speech (CLP) is a congenital disorder which deforms the speech of an individual. As a result their speech is not amenable to the speech recognition systems. The existing work on CLP speech enhancement is by using CycleGAN-VC based non-parallel voice conversion method. However, CycleGAN-VC cannot capture the time-frequency structures which can be done by MaskCycleGAN-VC by application of a module named as time-frequency adaptive normalization. It also has the added advantage of mel-spectrogram conversion rather than mel-spectrum conversion. This voice conversion of a CLP speech to a normal speech increases the intelligibility and thereby allows automatic speech recognition systems to predict the uttered sentences which is necessary in day to day life as speech recognition devices are automatizing living on a large scale. But in order to develop an assistive technology it is very essential to study the sensitivity of automatic speech recognizers. This work focuses on the sensitivity analysis of a MaskCycleGAN based voice conversion system depending on the variation of acoustic and gender mismatch.","PeriodicalId":246982,"journal":{"name":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM55316.2022.9840769","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Cleft lip and palate speech (CLP) is a congenital disorder which deforms the speech of an individual. As a result their speech is not amenable to the speech recognition systems. The existing work on CLP speech enhancement is by using CycleGAN-VC based non-parallel voice conversion method. However, CycleGAN-VC cannot capture the time-frequency structures which can be done by MaskCycleGAN-VC by application of a module named as time-frequency adaptive normalization. It also has the added advantage of mel-spectrogram conversion rather than mel-spectrum conversion. This voice conversion of a CLP speech to a normal speech increases the intelligibility and thereby allows automatic speech recognition systems to predict the uttered sentences which is necessary in day to day life as speech recognition devices are automatizing living on a large scale. But in order to develop an assistive technology it is very essential to study the sensitivity of automatic speech recognizers. This work focuses on the sensitivity analysis of a MaskCycleGAN based voice conversion system depending on the variation of acoustic and gender mismatch.