{"title":"一种新的口语教学技术:多注意与重复相结合的一次性跨语言语音转换","authors":"Dengfeng Ke, Wenhan Yao, Ruixin Hu, Liangjie Huang, Qi Luo, Wentao Shu","doi":"10.1109/ISCSLP57327.2022.10038137","DOIUrl":null,"url":null,"abstract":"Computer aided pronunciation training(CAPT) plays an important role in oral language teaching. The main methods of traditional computer-assisted oral teaching include mispronunciation detection and pronunciation scoring and assessment.However, these two techniques only give negative feedback information such as scores or error categories. In this case,it is difficult for learners to refine their pronunciation through these two indicators without the guidance of correct speech.To tackle this problem, we proposed a cross language voice conversion(VC) framework that can generate speech with template speech content and learners’ own timbre,which can guide the learner’s pronunciation.To improve VC effect,we apply AdaIN in the fore-end and after the Value matrix in multi-head attention once respectively,called attention-AdaIN,which can improve the style transfer and sequence generation ability.We used attention-AdaIN to construct VC framework based on VAE.Experiments conducted on the AISHELL-3 and VCTK corpus showed that this new aprroach improved the baseline VAE-VC.","PeriodicalId":246698,"journal":{"name":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A New Spoken Language Teaching Tech: Combining Multi-attention and AdaIN for One-shot Cross Language Voice Conversion\",\"authors\":\"Dengfeng Ke, Wenhan Yao, Ruixin Hu, Liangjie Huang, Qi Luo, Wentao Shu\",\"doi\":\"10.1109/ISCSLP57327.2022.10038137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computer aided pronunciation training(CAPT) plays an important role in oral language teaching. The main methods of traditional computer-assisted oral teaching include mispronunciation detection and pronunciation scoring and assessment.However, these two techniques only give negative feedback information such as scores or error categories. In this case,it is difficult for learners to refine their pronunciation through these two indicators without the guidance of correct speech.To tackle this problem, we proposed a cross language voice conversion(VC) framework that can generate speech with template speech content and learners’ own timbre,which can guide the learner’s pronunciation.To improve VC effect,we apply AdaIN in the fore-end and after the Value matrix in multi-head attention once respectively,called attention-AdaIN,which can improve the style transfer and sequence generation ability.We used attention-AdaIN to construct VC framework based on VAE.Experiments conducted on the AISHELL-3 and VCTK corpus showed that this new aprroach improved the baseline VAE-VC.\",\"PeriodicalId\":246698,\"journal\":{\"name\":\"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCSLP57327.2022.10038137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 13th International Symposium on Chinese Spoken Language Processing (ISCSLP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCSLP57327.2022.10038137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A New Spoken Language Teaching Tech: Combining Multi-attention and AdaIN for One-shot Cross Language Voice Conversion
Computer aided pronunciation training(CAPT) plays an important role in oral language teaching. The main methods of traditional computer-assisted oral teaching include mispronunciation detection and pronunciation scoring and assessment.However, these two techniques only give negative feedback information such as scores or error categories. In this case,it is difficult for learners to refine their pronunciation through these two indicators without the guidance of correct speech.To tackle this problem, we proposed a cross language voice conversion(VC) framework that can generate speech with template speech content and learners’ own timbre,which can guide the learner’s pronunciation.To improve VC effect,we apply AdaIN in the fore-end and after the Value matrix in multi-head attention once respectively,called attention-AdaIN,which can improve the style transfer and sequence generation ability.We used attention-AdaIN to construct VC framework based on VAE.Experiments conducted on the AISHELL-3 and VCTK corpus showed that this new aprroach improved the baseline VAE-VC.