K. Radzikowski, Mateusz Forc, Le Wang, O. Yoshie, R. Nowak
{"title":"非母语语音识别的口音中和","authors":"K. Radzikowski, Mateusz Forc, Le Wang, O. Yoshie, R. Nowak","doi":"10.1145/3366030.3366083","DOIUrl":null,"url":null,"abstract":"These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled non-native speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research, we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.","PeriodicalId":446280,"journal":{"name":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accent neutralization for speech recognition of non-native speakers\",\"authors\":\"K. Radzikowski, Mateusz Forc, Le Wang, O. Yoshie, R. Nowak\",\"doi\":\"10.1145/3366030.3366083\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled non-native speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research, we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.\",\"PeriodicalId\":446280,\"journal\":{\"name\":\"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3366030.3366083\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st International Conference on Information Integration and Web-based Applications & Services","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3366030.3366083","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accent neutralization for speech recognition of non-native speakers
These days, automatic speech recognition (ASR) systems achieve higher and higher accuracy rates. The score drops significantly, in case when the ASR system is being used with a non-native speaker of the language to be recognized. The main reason is specific pronunciation and accent features. A limited volume of labeled non-native speech datasets makes it difficult to train new ASR systems for non-native speakers. In our research, we tried tackling the problem and its influence on the accuracy of ASR systems, using the style transfer methodology. We designed a pipeline for modifying the speech of a non-native speaker, so that it resembles the native speech to a higher extent. Our methodology can be used as a wrapper for any existing ASR system, which reduces the necessity of training new algorithms for non-native speech. The modification can be thus performed before passing the data forward to the speech recognition system itself.