A. Sini, Antoine Perquin, Damien Lolive, Arnaud Delhay
{"title":"基于加权动态时间扭曲的英语语音评分","authors":"A. Sini, Antoine Perquin, Damien Lolive, Arnaud Delhay","doi":"10.1109/SLT54892.2023.10023182","DOIUrl":null,"url":null,"abstract":"This paper presents a novel approach for phone-level pronunciation scoring. The proposed method relies on the two usual stages of pronunciation scoring: an acoustic model transcribes the spoken utterance into a phoneme sequence and then, Weighted-Dynamic Time Warping (W-DTW) is used to compare the predicted phoneme sequence against the reference one. Our approach alters the comparison process by considering Phonetic PosteriorGrams (PPG) rather than only the most probable sequence of phonemes. This led us to propose a modified W-DTW algorithm that considers the probabilities of the predicted phonemes, as well as the use of articulatory features as a proxy of phonetic similarity. The results achieved are satisfactory considering the content of the adult speech database and are comparable to well-known state-of-the-art methods.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Phone-Level Pronunciation Scoring for L1 Using Weighted-Dynamic Time Warping\",\"authors\":\"A. Sini, Antoine Perquin, Damien Lolive, Arnaud Delhay\",\"doi\":\"10.1109/SLT54892.2023.10023182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper presents a novel approach for phone-level pronunciation scoring. The proposed method relies on the two usual stages of pronunciation scoring: an acoustic model transcribes the spoken utterance into a phoneme sequence and then, Weighted-Dynamic Time Warping (W-DTW) is used to compare the predicted phoneme sequence against the reference one. Our approach alters the comparison process by considering Phonetic PosteriorGrams (PPG) rather than only the most probable sequence of phonemes. This led us to propose a modified W-DTW algorithm that considers the probabilities of the predicted phonemes, as well as the use of articulatory features as a proxy of phonetic similarity. The results achieved are satisfactory considering the content of the adult speech database and are comparable to well-known state-of-the-art methods.\",\"PeriodicalId\":352002,\"journal\":{\"name\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"volume\":\"30 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Spoken Language Technology Workshop (SLT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SLT54892.2023.10023182\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT54892.2023.10023182","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Phone-Level Pronunciation Scoring for L1 Using Weighted-Dynamic Time Warping
This paper presents a novel approach for phone-level pronunciation scoring. The proposed method relies on the two usual stages of pronunciation scoring: an acoustic model transcribes the spoken utterance into a phoneme sequence and then, Weighted-Dynamic Time Warping (W-DTW) is used to compare the predicted phoneme sequence against the reference one. Our approach alters the comparison process by considering Phonetic PosteriorGrams (PPG) rather than only the most probable sequence of phonemes. This led us to propose a modified W-DTW algorithm that considers the probabilities of the predicted phonemes, as well as the use of articulatory features as a proxy of phonetic similarity. The results achieved are satisfactory considering the content of the adult speech database and are comparable to well-known state-of-the-art methods.