XDF-REPA: A Densely Labeled Dataset toward Refined Pronunciation Assessment for English Learning

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA) Pub Date : 2019-10-01 DOI:10.1109/O-COCOSDA46868.2019.9041154

Yun Gao, Zhigang Ou, Jianfeng Cheng, Yong Ruan, Xiangdong Wang, Yueliang Qian

{"title":"XDF-REPA: A Densely Labeled Dataset toward Refined Pronunciation Assessment for English Learning","authors":"Yun Gao, Zhigang Ou, Jianfeng Cheng, Yong Ruan, Xiangdong Wang, Yueliang Qian","doi":"10.1109/O-COCOSDA46868.2019.9041154","DOIUrl":null,"url":null,"abstract":"Currently, most computer assisted pronunciation training (CAPT) systems focus on overall scoring or mispronunciation detection. In this paper, we address the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners. To meet the major challenge of the lack of densely labeled data, we present the XDF-REPA dataset, which is freely available to the public. The dataset contains 19,213 English word utterances by 18 Chinese adults, among which 4,200 audio clips from 9 speakers are densely labeled by 3 linguists with intended phoneme, actually uttered phoneme, phoneme score for each phoneme, and an overall score for the word as well. To reduce the difference between annotators, scoring rules combining subjectivity and objectivity are defined. To demonstrate the usage of the dataset and provide a baseline for other researchers, a prototype system for RPA is developed and described in the paper, which adopts a DNN-HMM based acoustic model and a variant of Goodness of Pronunciation (GOP) to yield all the corrective feedbacks needed for RPA. Experimental results show error detection accuracy varies from 80.1% to 85.1% for different subsets and linguists, and accuracy of actually-uttered-phoneme recognition varies from 70.9% to 80.8% for different subsets and linguists.","PeriodicalId":263209,"journal":{"name":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/O-COCOSDA46868.2019.9041154","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Currently, most computer assisted pronunciation training (CAPT) systems focus on overall scoring or mispronunciation detection. In this paper, we address the issue of refined pronunciation assessment (RPA), which aims at providing more refined information to L2 learners. To meet the major challenge of the lack of densely labeled data, we present the XDF-REPA dataset, which is freely available to the public. The dataset contains 19,213 English word utterances by 18 Chinese adults, among which 4,200 audio clips from 9 speakers are densely labeled by 3 linguists with intended phoneme, actually uttered phoneme, phoneme score for each phoneme, and an overall score for the word as well. To reduce the difference between annotators, scoring rules combining subjectivity and objectivity are defined. To demonstrate the usage of the dataset and provide a baseline for other researchers, a prototype system for RPA is developed and described in the paper, which adopts a DNN-HMM based acoustic model and a variant of Goodness of Pronunciation (GOP) to yield all the corrective feedbacks needed for RPA. Experimental results show error detection accuracy varies from 80.1% to 85.1% for different subsets and linguists, and accuracy of actually-uttered-phoneme recognition varies from 70.9% to 80.8% for different subsets and linguists.

查看原文本刊更多论文

XDF-REPA:面向英语学习的精细发音评估的密集标记数据集

目前，大多数计算机辅助发音训练(CAPT)系统侧重于整体评分或错误发音检测。在本文中，我们讨论了精炼发音评估(RPA)的问题，旨在为二语学习者提供更精炼的信息。为了应对缺乏密集标记数据的主要挑战，我们提出了XDF-REPA数据集，该数据集对公众免费提供。该数据集包含18位中国成年人的19213个英语单词的发音，其中来自9位说话者的4200个音频片段被3位语言学家密集地标记为预定音素、实际发出的音素、每个音素的音素得分以及单词的总分。为了减少标注者之间的差异，定义了主观性与客观性相结合的评分规则。为了演示数据集的使用并为其他研究人员提供基准，本文开发并描述了一个RPA原型系统，该系统采用基于DNN-HMM的声学模型和发音良度(GOP)的变体来产生RPA所需的所有纠正反馈。实验结果表明，不同子集和不同语言学家的错误检测准确率在80.1% ~ 85.1%之间，不同子集和不同语言学家的实际发音音素识别准确率在70.9% ~ 80.8%之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 22nd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA)

自引率

0.00%

发文量