{"title":"Cross-Lingual Transfer Learning Approach to Phoneme Error Detection via Latent Phonetic Representation","authors":"Jovan M. Dalhouse, K. Itou","doi":"10.21437/interspeech.2022-10228","DOIUrl":null,"url":null,"abstract":"Extensive research has been conducted on CALL systems for Pronunciation Error detection to automate language improvement through self-evaluation. However, many of these previous approaches have relied on HMM or Neural Network Hybrid Models which, although have proven to be effective, often utilize phonetically labelled L2 speech data which is ex-pensive and often scarce. This paper discusses a ”zero-shot” transfer learning approach to detect phonetic errors in L2 English speech by Japanese native speakers using solely unaligned phonetically labelled native language speech. The proposed method introduces a simple base architecture which utilizes the XLSR-Wav2Vec2.0 model pre-trained on unlabelled multilingual speech. Phoneme mapping for each language is determined based on difference of articulation of similar phonemes. This method achieved a Phonetic Error Rate of 0.214 on erroneous L2 speech after fine-tuning on 70 hours of speech with low resource automated phonetic labelling, and proved to ad-ditionally model phonemes of the native language of the L2 speaker effectively without the need for L2 speech fine-tuning.","PeriodicalId":73500,"journal":{"name":"Interspeech","volume":"1 1","pages":"3133-3137"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interspeech","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21437/interspeech.2022-10228","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Extensive research has been conducted on CALL systems for Pronunciation Error detection to automate language improvement through self-evaluation. However, many of these previous approaches have relied on HMM or Neural Network Hybrid Models which, although have proven to be effective, often utilize phonetically labelled L2 speech data which is ex-pensive and often scarce. This paper discusses a ”zero-shot” transfer learning approach to detect phonetic errors in L2 English speech by Japanese native speakers using solely unaligned phonetically labelled native language speech. The proposed method introduces a simple base architecture which utilizes the XLSR-Wav2Vec2.0 model pre-trained on unlabelled multilingual speech. Phoneme mapping for each language is determined based on difference of articulation of similar phonemes. This method achieved a Phonetic Error Rate of 0.214 on erroneous L2 speech after fine-tuning on 70 hours of speech with low resource automated phonetic labelling, and proved to ad-ditionally model phonemes of the native language of the L2 speaker effectively without the need for L2 speech fine-tuning.