CIMA:用于辅导的大型开放获取对话数据集

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2020-07-01 DOI:10.18653/v1/2020.bea-1.5

Katherine Stasaski, Kimberly Kao, Marti A. Hearst

{"title":"CIMA:用于辅导的大型开放获取对话数据集","authors":"Katherine Stasaski, Kimberly Kao, Marti A. Hearst","doi":"10.18653/v1/2020.bea-1.5","DOIUrl":null,"url":null,"abstract":"One-to-one tutoring is often an effective means to help students learn, and recent experiments with neural conversation systems are promising. However, large open datasets of tutoring conversations are lacking. To remedy this, we propose a novel asynchronous method for collecting tutoring dialogue via crowdworkers that is both amenable to the needs of deep learning algorithms and reflective of pedagogical concerns. In this approach, extended conversations are obtained between crowdworkers role-playing as both students and tutors. The CIMA collection, which we make publicly available, is novel in that students are exposed to overlapping grounded concepts between exercises and multiple relevant tutoring responses are collected for the same input. CIMA contains several compelling properties from an educational perspective: student role-players complete exercises in fewer turns during the course of the conversation and tutor players adopt strategies that conform with some educational conversational norms, such as providing hints versus asking questions in appropriate contexts. The dataset enables a model to be trained to generate the next tutoring utterance in a conversation, conditioned on a provided action strategy.","PeriodicalId":363390,"journal":{"name":"Workshop on Innovative Use of NLP for Building Educational Applications","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"CIMA: A Large Open Access Dialogue Dataset for Tutoring\",\"authors\":\"Katherine Stasaski, Kimberly Kao, Marti A. Hearst\",\"doi\":\"10.18653/v1/2020.bea-1.5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"One-to-one tutoring is often an effective means to help students learn, and recent experiments with neural conversation systems are promising. However, large open datasets of tutoring conversations are lacking. To remedy this, we propose a novel asynchronous method for collecting tutoring dialogue via crowdworkers that is both amenable to the needs of deep learning algorithms and reflective of pedagogical concerns. In this approach, extended conversations are obtained between crowdworkers role-playing as both students and tutors. The CIMA collection, which we make publicly available, is novel in that students are exposed to overlapping grounded concepts between exercises and multiple relevant tutoring responses are collected for the same input. CIMA contains several compelling properties from an educational perspective: student role-players complete exercises in fewer turns during the course of the conversation and tutor players adopt strategies that conform with some educational conversational norms, such as providing hints versus asking questions in appropriate contexts. The dataset enables a model to be trained to generate the next tutoring utterance in a conversation, conditioned on a provided action strategy.\",\"PeriodicalId\":363390,\"journal\":{\"name\":\"Workshop on Innovative Use of NLP for Building Educational Applications\",\"volume\":\"77 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Innovative Use of NLP for Building Educational Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18653/v1/2020.bea-1.5\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Innovative Use of NLP for Building Educational Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.bea-1.5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

一对一辅导通常是帮助学生学习的有效手段，最近神经对话系统的实验也很有希望。然而，目前缺乏大型的开放式辅导对话数据集。为了解决这个问题，我们提出了一种新的异步方法，通过众包工作者收集辅导对话，既可以满足深度学习算法的需求，又可以反映教学问题。在这种方法中，作为学生和导师的众包工作者之间进行了扩展的对话。我们公开提供的CIMA收集是新颖的，因为学生在练习之间接触到重叠的基础概念，并且为相同的输入收集了多个相关的辅导回答。从教育的角度来看，CIMA包含了几个引人注目的特性:学生角色玩家在对话过程中以更少的回合完成练习，导师玩家采用符合一些教育会话规范的策略，例如在适当的上下文中提供提示和提问。该数据集使模型能够在提供的行动策略的条件下，在对话中生成下一个辅导话语。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CIMA: A Large Open Access Dialogue Dataset for Tutoring

One-to-one tutoring is often an effective means to help students learn, and recent experiments with neural conversation systems are promising. However, large open datasets of tutoring conversations are lacking. To remedy this, we propose a novel asynchronous method for collecting tutoring dialogue via crowdworkers that is both amenable to the needs of deep learning algorithms and reflective of pedagogical concerns. In this approach, extended conversations are obtained between crowdworkers role-playing as both students and tutors. The CIMA collection, which we make publicly available, is novel in that students are exposed to overlapping grounded concepts between exercises and multiple relevant tutoring responses are collected for the same input. CIMA contains several compelling properties from an educational perspective: student role-players complete exercises in fewer turns during the course of the conversation and tutor players adopt strategies that conform with some educational conversational norms, such as providing hints versus asking questions in appropriate contexts. The dataset enables a model to be trained to generate the next tutoring utterance in a conversation, conditioned on a provided action strategy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Innovative Use of NLP for Building Educational Applications

自引率

0.00%

发文量