Crowdsourcing evaluation of the quality of automatically generated questions for supporting computer-assisted language teaching

IF 5.7 1区文学 Q1 EDUCATION & EDUCATIONAL RESEARCH

Recall Pub Date : 2019-10-04 DOI:10.1017/S0958344019000193

Maria Chinkina, Simón Ruiz, Walt Detmar Meurers

{"title":"Crowdsourcing evaluation of the quality of automatically generated questions for supporting computer-assisted language teaching","authors":"Maria Chinkina, Simón Ruiz, Walt Detmar Meurers","doi":"10.1017/S0958344019000193","DOIUrl":null,"url":null,"abstract":"Abstract How can state-of-the-art computational linguistic technology reduce the workload and increase the efficiency of language teachers? To address this question, we combine insights from research in second language acquisition and computational linguistics to automatically generate text-based questions to a given text. The questions are designed to draw the learner’s attention to target linguistic forms – phrasal verbs, in this particular case – by requiring them to use the forms or their paraphrases in the answer. Such questions help learners create form-meaning connections and are well suited for both practice and testing. We discuss the generation of a novel type of question combining a wh- question with a gapped sentence, and report the results of two crowdsourcing evaluation studies investigating how well automatically generated questions compare to those written by a language teacher. The first study compares our system output to gold standard human-written questions via crowdsourcing rating. An equivalence test shows that automatically generated questions are comparable to human-written ones. The second crowdsourcing study investigates two types of questions (wh- questions with and without a gapped sentence), their perceived quality, and the responses they elicit. Finally, we discuss the challenges and limitations of creating and evaluating question-generation systems for language learners.","PeriodicalId":47046,"journal":{"name":"Recall","volume":"32 1","pages":"145 - 161"},"PeriodicalIF":5.7000,"publicationDate":"2019-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1017/S0958344019000193","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recall","FirstCategoryId":"98","ListUrlMain":"https://doi.org/10.1017/S0958344019000193","RegionNum":1,"RegionCategory":"文学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"EDUCATION & EDUCATIONAL RESEARCH","Score":null,"Total":0}

引用次数: 5

Abstract

Abstract How can state-of-the-art computational linguistic technology reduce the workload and increase the efficiency of language teachers? To address this question, we combine insights from research in second language acquisition and computational linguistics to automatically generate text-based questions to a given text. The questions are designed to draw the learner’s attention to target linguistic forms – phrasal verbs, in this particular case – by requiring them to use the forms or their paraphrases in the answer. Such questions help learners create form-meaning connections and are well suited for both practice and testing. We discuss the generation of a novel type of question combining a wh- question with a gapped sentence, and report the results of two crowdsourcing evaluation studies investigating how well automatically generated questions compare to those written by a language teacher. The first study compares our system output to gold standard human-written questions via crowdsourcing rating. An equivalence test shows that automatically generated questions are comparable to human-written ones. The second crowdsourcing study investigates two types of questions (wh- questions with and without a gapped sentence), their perceived quality, and the responses they elicit. Finally, we discuss the challenges and limitations of creating and evaluating question-generation systems for language learners.

查看原文本刊更多论文

支持计算机辅助语言教学的自动生成问题质量的众包评估

最先进的计算语言技术如何减少语言教师的工作量，提高他们的工作效率?为了解决这个问题，我们结合了第二语言习得研究和计算语言学的见解，对给定文本自动生成基于文本的问题。这些问题的设计目的是通过要求学习者在答案中使用这些形式或对它们的解释，将学习者的注意力吸引到目标语言形式上——在这个特殊的例子中是动词短语。这样的问题可以帮助学习者建立形式与意义的联系，非常适合练习和测试。我们讨论了一种新型问题的生成，该问题结合了一个wh-问句和一个空白句子，并报告了两项众包评估研究的结果，该研究调查了自动生成的问题与语言教师编写的问题相比有多好。第一项研究通过众包评级将我们的系统输出与黄金标准的人工编写问题进行比较。等效性测试表明，自动生成的问题与人工编写的问题相当。第二个众包研究调查了两种类型的问题(有和没有间隔句的问题)，它们的感知质量，以及它们引起的反应。最后，我们讨论了为语言学习者创建和评估问题生成系统的挑战和局限性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Recall Multiple-

CiteScore

8.50

自引率

4.40%

发文量