Abhijith Madan, Ayush Khopkar, Shreekantha Nadig, M. SrinivasaRaghavanK., Dhanya Eledath, V. Ramasubramanian
{"title":"声学模型再训练的半监督学习:带噪声文本的语音数据处理","authors":"Abhijith Madan, Ayush Khopkar, Shreekantha Nadig, M. SrinivasaRaghavanK., Dhanya Eledath, V. Ramasubramanian","doi":"10.1109/SPCOM50965.2020.9179517","DOIUrl":null,"url":null,"abstract":"We address the problem of retraining a seed acoustic model from a large corpus which is associated with noisy labeling. We propose a forced-alignment likelihood and fuzzy string matching score based iterative selection of the corpus data to retrain the acoustic model in an order of increasing degree of noise in the transcript, yielding a succession of enhanced acoustic models, offering progressively lower error rates on an held-out test data. We show results in terms of PER (phoneme-error-rate) on a large broadcast news data from a national broadcast network containing multiple languages of transcribed-speech, demonstrating the strong utility of such an approach for training of acoustic models from noisy-transcript.","PeriodicalId":208527,"journal":{"name":"2020 International Conference on Signal Processing and Communications (SPCOM)","volume":"7 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Semi-supervised learning for acoustic model retraining: Handling speech data with noisy transcript\",\"authors\":\"Abhijith Madan, Ayush Khopkar, Shreekantha Nadig, M. SrinivasaRaghavanK., Dhanya Eledath, V. Ramasubramanian\",\"doi\":\"10.1109/SPCOM50965.2020.9179517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We address the problem of retraining a seed acoustic model from a large corpus which is associated with noisy labeling. We propose a forced-alignment likelihood and fuzzy string matching score based iterative selection of the corpus data to retrain the acoustic model in an order of increasing degree of noise in the transcript, yielding a succession of enhanced acoustic models, offering progressively lower error rates on an held-out test data. We show results in terms of PER (phoneme-error-rate) on a large broadcast news data from a national broadcast network containing multiple languages of transcribed-speech, demonstrating the strong utility of such an approach for training of acoustic models from noisy-transcript.\",\"PeriodicalId\":208527,\"journal\":{\"name\":\"2020 International Conference on Signal Processing and Communications (SPCOM)\",\"volume\":\"7 2\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Signal Processing and Communications (SPCOM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SPCOM50965.2020.9179517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Signal Processing and Communications (SPCOM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SPCOM50965.2020.9179517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Semi-supervised learning for acoustic model retraining: Handling speech data with noisy transcript
We address the problem of retraining a seed acoustic model from a large corpus which is associated with noisy labeling. We propose a forced-alignment likelihood and fuzzy string matching score based iterative selection of the corpus data to retrain the acoustic model in an order of increasing degree of noise in the transcript, yielding a succession of enhanced acoustic models, offering progressively lower error rates on an held-out test data. We show results in terms of PER (phoneme-error-rate) on a large broadcast news data from a national broadcast network containing multiple languages of transcribed-speech, demonstrating the strong utility of such an approach for training of acoustic models from noisy-transcript.