Ryo Masumura, Suguru Kabashima, Takafumi Moriya, Satoshi Kobashikawa, Y. Yamaguchi, Y. Aono
{"title":"基于英语和日语语音的日英自动语音识别相关语音感知神经声学模型","authors":"Ryo Masumura, Suguru Kabashima, Takafumi Moriya, Satoshi Kobashikawa, Y. Yamaguchi, Y. Aono","doi":"10.23919/APSIPA.2018.8659784","DOIUrl":null,"url":null,"abstract":"This paper proposes relevant phonetic-aware neural acoustic models that leverage native Japanese speech and native English speech to create improved automatic speech recognition (ASR) of Japanese-English speech. In order to accurately transcribe Japanese-English speech in ASR, acoustic models are needed that are specific to Japanese-English speech since Japanese-English speech exhibits pronunciations that differ from those of native English speech. The major problem is that it is difficult to collect a lot of Japanese-English speech for constructing acoustic models. Therefore, our motivation is to efficiently leverage the significant amounts of native English and native Japanese speech material available since Japanese-English is definitely affected by both native English and native Japanese. Our idea is to utilize them indirectly to enhance the phonetic-awareness of Japanese-English acoustic models. It can be expected that the native English speech is effective in enhancing the classification performance of English-like phonemes, while the native Japanese speech is effective in enhancing the classification performance of Japanese-like phonemes. In the proposed relevant phonetic-aware neural acoustic models, this idea is implemented by utilizing bottleneck features of native English and native Japanese neural acoustic models. Our experiments construct the relevant phonetic-aware neural acoustic models by utilizing 300 hours of Japanese-English speech, 1,500 hours of native Japanese speech, and 900 hours of native English speech. We demonstrate effectiveness of our proposal using evaluation data sets that involve four levels of Japanese-English.","PeriodicalId":287799,"journal":{"name":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Relevant Phonetic-aware Neural Acoustic Models using Native English and Japanese Speech for Japanese-English Automatic Speech Recognition\",\"authors\":\"Ryo Masumura, Suguru Kabashima, Takafumi Moriya, Satoshi Kobashikawa, Y. Yamaguchi, Y. Aono\",\"doi\":\"10.23919/APSIPA.2018.8659784\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes relevant phonetic-aware neural acoustic models that leverage native Japanese speech and native English speech to create improved automatic speech recognition (ASR) of Japanese-English speech. In order to accurately transcribe Japanese-English speech in ASR, acoustic models are needed that are specific to Japanese-English speech since Japanese-English speech exhibits pronunciations that differ from those of native English speech. The major problem is that it is difficult to collect a lot of Japanese-English speech for constructing acoustic models. Therefore, our motivation is to efficiently leverage the significant amounts of native English and native Japanese speech material available since Japanese-English is definitely affected by both native English and native Japanese. Our idea is to utilize them indirectly to enhance the phonetic-awareness of Japanese-English acoustic models. It can be expected that the native English speech is effective in enhancing the classification performance of English-like phonemes, while the native Japanese speech is effective in enhancing the classification performance of Japanese-like phonemes. In the proposed relevant phonetic-aware neural acoustic models, this idea is implemented by utilizing bottleneck features of native English and native Japanese neural acoustic models. Our experiments construct the relevant phonetic-aware neural acoustic models by utilizing 300 hours of Japanese-English speech, 1,500 hours of native Japanese speech, and 900 hours of native English speech. We demonstrate effectiveness of our proposal using evaluation data sets that involve four levels of Japanese-English.\",\"PeriodicalId\":287799,\"journal\":{\"name\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/APSIPA.2018.8659784\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/APSIPA.2018.8659784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Relevant Phonetic-aware Neural Acoustic Models using Native English and Japanese Speech for Japanese-English Automatic Speech Recognition
This paper proposes relevant phonetic-aware neural acoustic models that leverage native Japanese speech and native English speech to create improved automatic speech recognition (ASR) of Japanese-English speech. In order to accurately transcribe Japanese-English speech in ASR, acoustic models are needed that are specific to Japanese-English speech since Japanese-English speech exhibits pronunciations that differ from those of native English speech. The major problem is that it is difficult to collect a lot of Japanese-English speech for constructing acoustic models. Therefore, our motivation is to efficiently leverage the significant amounts of native English and native Japanese speech material available since Japanese-English is definitely affected by both native English and native Japanese. Our idea is to utilize them indirectly to enhance the phonetic-awareness of Japanese-English acoustic models. It can be expected that the native English speech is effective in enhancing the classification performance of English-like phonemes, while the native Japanese speech is effective in enhancing the classification performance of Japanese-like phonemes. In the proposed relevant phonetic-aware neural acoustic models, this idea is implemented by utilizing bottleneck features of native English and native Japanese neural acoustic models. Our experiments construct the relevant phonetic-aware neural acoustic models by utilizing 300 hours of Japanese-English speech, 1,500 hours of native Japanese speech, and 900 hours of native English speech. We demonstrate effectiveness of our proposal using evaluation data sets that involve four levels of Japanese-English.