Peppanet: Effective Mispronunciation Detection and Diagnosis Leveraging Phonetic, Phonological, and Acoustic Cues

Bi-Cheng Yan, Hsin-Wei Wang, Berlin Chen
{"title":"Peppanet: Effective Mispronunciation Detection and Diagnosis Leveraging Phonetic, Phonological, and Acoustic Cues","authors":"Bi-Cheng Yan, Hsin-Wei Wang, Berlin Chen","doi":"10.1109/SLT54892.2023.10022472","DOIUrl":null,"url":null,"abstract":"Mispronunciation detection and diagnosis (MDD) aims to detect erroneous pronunciation segments in an L2 learner's articulation and subsequently provide informative diagnostic feedback. Most existing neural methods follow a dictation-based modeling paradigm that finds out pronunciation errors and returns diagnostic feedback at the same time by aligning the recognized phone sequence uttered by an L2 learner to the corresponding canonical phone sequence of a given text prompt. However, the main downside of these methods is that the dictation process and alignment process are mostly made independent of each other. In view of this, we present a novel end-to-end neural method, dubbed PeppaNet, building on a unified structure that can jointly model the dictation process and the alignment process. The model of our method learns to directly predict the pronunciation correctness of each canonical phone of the text prompt and in turn provides its corresponding diagnostic feedback. In contrast to the conventional dictation-based methods that rely mainly on a free-phone recognition process, PeppaNet makes good use of an effective selective gating mechanism to simultaneously incorporate phonetic, phonological and acoustic cues to generate corrections that are more proper and phonetically related to the canonical pronunciations. Extensive sets of experiments conducted on the L2-ARCTIC benchmark dataset seem to show the merits of our proposed method in comparison to some recent top-of-the-line methods.","PeriodicalId":352002,"journal":{"name":"2022 IEEE Spoken Language Technology Workshop (SLT)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Spoken Language Technology Workshop (SLT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SLT54892.2023.10022472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Mispronunciation detection and diagnosis (MDD) aims to detect erroneous pronunciation segments in an L2 learner's articulation and subsequently provide informative diagnostic feedback. Most existing neural methods follow a dictation-based modeling paradigm that finds out pronunciation errors and returns diagnostic feedback at the same time by aligning the recognized phone sequence uttered by an L2 learner to the corresponding canonical phone sequence of a given text prompt. However, the main downside of these methods is that the dictation process and alignment process are mostly made independent of each other. In view of this, we present a novel end-to-end neural method, dubbed PeppaNet, building on a unified structure that can jointly model the dictation process and the alignment process. The model of our method learns to directly predict the pronunciation correctness of each canonical phone of the text prompt and in turn provides its corresponding diagnostic feedback. In contrast to the conventional dictation-based methods that rely mainly on a free-phone recognition process, PeppaNet makes good use of an effective selective gating mechanism to simultaneously incorporate phonetic, phonological and acoustic cues to generate corrections that are more proper and phonetically related to the canonical pronunciations. Extensive sets of experiments conducted on the L2-ARCTIC benchmark dataset seem to show the merits of our proposed method in comparison to some recent top-of-the-line methods.
Peppanet:利用语音、语音和声学线索有效的错误发音检测和诊断
错误发音检测和诊断(MDD)旨在检测二语学习者发音中的错误发音片段,并随后提供信息诊断反馈。大多数现有的神经方法都遵循基于听写的建模范式,通过将二语学习者发出的识别电话序列与给定文本提示的相应规范电话序列相匹配,发现发音错误并同时返回诊断反馈。然而,这些方法的主要缺点是听写过程和对齐过程大多是相互独立的。鉴于此,我们提出了一种新的端到端神经网络方法,称为PeppaNet,它建立在一个统一的结构上,可以联合建模听写过程和对齐过程。该方法的模型学习直接预测文本提示符的每个规范电话的发音正确性,并提供相应的诊断反馈。与传统的基于听写的方法主要依赖于自由电话识别过程相比,PeppaNet很好地利用了一种有效的选择性门控机制,同时结合语音、语音和声学线索来生成更合适的、与标准发音相关的纠正。在L2-ARCTIC基准数据集上进行的大量实验似乎表明,与最近的一些顶级方法相比,我们提出的方法具有优点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信