XDF-REPA: A Densely Labeled Dataset toward Refined Pronunciation Assessment for English Learning

Yun Gao, Zhigang Ou, Jianfeng Cheng, Yong Ruan, Xiangdong Wang, Yueliang Qian
{"title":"XDF-REPA: A Densely Labeled Dataset toward Refined Pronunciation Assessment for English Learning","authors":"Yun Gao, Zhigang Ou, Jianfeng Cheng, Yong Ruan, Xiangdong Wang, Yueliang Qian","doi":"10.1109/O-COCOSDA46868.2019.9041154","DOIUrl":null,"url":null,"abstract":"Currently, most computer assisted pronunciation training (CAPT) systems focus on overall scoring or mispronunciation detection. In this paper, we address the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners. To meet the major challenge of the lack of densely labeled data, we present the XDF-REPA dataset, which is freely available to the public. The dataset contains 19,213 English word utterances by 18 Chinese adults, among which 4,200 audio clips from 9 speakers are densely labeled by 3 linguists with intended phoneme, actually uttered phoneme, phoneme score for each phoneme, and an overall score for the word as well. To reduce the difference between annotators, scoring rules combining subjectivity and objectivity are defined. To demonstrate the usage of the dataset and provide a baseline for other researchers, a prototype system for RPA is developed and described in the paper, which adopts a DNN-HMM based acoustic model and a variant of Goodness of Pronunciation (GOP) to yield all the corrective feedbacks needed for RPA. Experimental results show error detection accuracy varies from 80.1% to 85.1% for different subsets and linguists, and accuracy of actually-uttered-phoneme recognition varies from 70.9% to 80.8% for different subsets and linguists.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Currently, most computer assisted pronunciation training (CAPT) systems focus on overall scoring or mispronunciation detection. In this paper, we address the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners. To meet the major challenge of the lack of densely labeled data, we present the XDF-REPA dataset, which is freely available to the public. The dataset contains 19,213 English word utterances by 18 Chinese adults, among which 4,200 audio clips from 9 speakers are densely labeled by 3 linguists with intended phoneme, actually uttered phoneme, phoneme score for each phoneme, and an overall score for the word as well. To reduce the difference between annotators, scoring rules combining subjectivity and objectivity are defined. To demonstrate the usage of the dataset and provide a baseline for other researchers, a prototype system for RPA is developed and described in the paper, which adopts a DNN-HMM based acoustic model and a variant of Goodness of Pronunciation (GOP) to yield all the corrective feedbacks needed for RPA. Experimental results show error detection accuracy varies from 80.1% to 85.1% for different subsets and linguists, and accuracy of actually-uttered-phoneme recognition varies from 70.9% to 80.8% for different subsets and linguists.
XDF-REPA:面向英语学习的精细发音评估的密集标记数据集
目前,大多数计算机辅助发音训练(CAPT)系统侧重于整体评分或错误发音检测。在本文中,我们讨论了精炼发音评估(RPA)的问题,旨在为二语学习者提供更精炼的信息。为了应对缺乏密集标记数据的主要挑战,我们提出了XDF-REPA数据集,该数据集对公众免费提供。该数据集包含18位中国成年人的19213个英语单词的发音,其中来自9位说话者的4200个音频片段被3位语言学家密集地标记为预定音素、实际发出的音素、每个音素的音素得分以及单词的总分。为了减少标注者之间的差异,定义了主观性与客观性相结合的评分规则。为了演示数据集的使用并为其他研究人员提供基准,本文开发并描述了一个RPA原型系统,该系统采用基于DNN-HMM的声学模型和发音良度(GOP)的变体来产生RPA所需的所有纠正反馈。实验结果表明,不同子集和不同语言学家的错误检测准确率在80.1% ~ 85.1%之间,不同子集和不同语言学家的实际发音音素识别准确率在70.9% ~ 80.8%之间。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信