Workshop on Innovative Use of NLP for Building Educational Applications最新文献

Assessing the efficacy of large language models in generating accurate teacher responses 评估大型语言模型在产生准确教师反应方面的功效

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-07-09 DOI: 10.48550/arXiv.2307.04274

Yann Hicke, Abhishek Masand, Wentao Guo, Tushaar Gangavarapu

{"title":"Assessing the efficacy of large language models in generating accurate teacher responses","authors":"Yann Hicke, Abhishek Masand, Wentao Guo, Tushaar Gangavarapu","doi":"10.48550/arXiv.2307.04274","DOIUrl":"https://doi.org/10.48550/arXiv.2307.04274","url":null,"abstract":"(Tack et al., 2023) organized the shared task hosted by the 18th Workshop on Innovative Use of NLP for Building Educational Applications on generation of teacher language in educational dialogues. Following the structure of the shared task, in this study, we attempt to assess the generative abilities of large language models in providing informative and helpful insights to students, thereby simulating the role of a knowledgeable teacher. To this end, we present an extensive evaluation of several benchmarking generative models, including GPT-4 (few-shot, in-context learning), fine-tuned GPT-2, and fine-tuned DialoGPT. Additionally, to optimize for pedagogical quality, we fine-tuned the Flan-T5 model using reinforcement learning. Our experimental findings on the Teacher-Student Chatroom Corpus subset indicate the efficacy of GPT-4 over other fine-tuned models, measured using BERTScore and DialogRPT. We hypothesize that several dataset characteristics, including sampling, representativeness, and dialog completeness, pose significant challenges to fine-tuning, thus contributing to the poor generalizability of the fine-tuned models. Finally, we note the need for these generative models to be evaluated with a metric that relies not only on dialog coherence and matched language modeling distribution but also on the model’s ability to showcase pedagogical skills.","PeriodicalId":363390,"journal":{"name":"Workshop on Innovative Use of NLP for Building Educational Applications","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123967066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Japanese Lexical Complexity for Non-Native Readers: A New Dataset 非母语读者的日语词汇复杂性:一个新的数据集

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-06-30 DOI: 10.48550/arXiv.2306.17399

Yusuke Ide, Masato Mita, Adam Nohejl, Hiroki Ouchi, Taro Watanabe

引用次数: 1

Beyond Black Box AI generated Plagiarism Detection: From Sentence to Document Level 超越黑盒AI生成的抄袭检测:从句子到文档级别

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-06-13 DOI: 10.48550/arXiv.2306.08122

Mujahid Ali Quidwai, Chun Xing Li, Parijat Dube

引用次数: 3

The BEA 2023 Shared Task on Generating AI Teacher Responses in Educational Dialogues BEA 2023关于在教育对话中生成AI教师响应的共享任务

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-06-12 DOI: 10.48550/arXiv.2306.06941

Anaïs Tack, E. Kochmar, Zheng Yuan, Serge Bibauw, C. Piech

{"title":"The BEA 2023 Shared Task on Generating AI Teacher Responses in Educational Dialogues","authors":"Anaïs Tack, E. Kochmar, Zheng Yuan, Serge Bibauw, C. Piech","doi":"10.48550/arXiv.2306.06941","DOIUrl":"https://doi.org/10.48550/arXiv.2306.06941","url":null,"abstract":"This paper describes the results of the first shared task on generation of teacher responses in educational dialogues. The goal of the task was to benchmark the ability of generative language models to act as AI teachers, replying to a student in a teacherstudent dialogue. Eight teams participated in the competition hosted on CodaLab and experimented with a wide variety of state-of-the-art models, including Alpaca, Bloom, DialoGPT, DistilGPT-2, Flan-T5, GPT- 2, GPT-3, GPT-4, LLaMA, OPT-2.7B, and T5- base. Their submissions were automatically scored using BERTScore and DialogRPT metrics, and the top three among them were further manually evaluated in terms of pedagogical ability based on Tack and Piech (2022). The NAISTeacher system, which ranked first in both automated and human evaluation, generated responses with GPT-3.5 Turbo using an ensemble of prompts and DialogRPT-based ranking of responses for given dialogue contexts. Despite promising achievements of the participating teams, the results also highlight the need for evaluation metrics better suited to educational contexts.","PeriodicalId":363390,"journal":{"name":"Workshop on Innovative Use of NLP for Building Educational Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130622539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Gender-Inclusive Grammatical Error Correction through Augmentation 性别包容性语法错误的强化纠正

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-06-12 DOI: 10.48550/arXiv.2306.07415

Gunnar Lund, Kostiantyn Omelianchuk, Igor Samokhin

引用次数: 0

Span Identification of Epistemic Stance-Taking in Academic Written English 学术写作中认知立场的跨域识别

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-06-03 DOI: 10.48550/arXiv.2306.02038

Masaki Eguchi, K. Kyle

引用次数: 0

UKP-SQuARE: An Interactive Tool for Teaching Question Answering UKP-SQuARE:教学问答的互动工具

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-05-31 DOI: 10.48550/arXiv.2305.19748

Haishuo Fang, Haritz Puerto, Iryna Gurevych

{"title":"UKP-SQuARE: An Interactive Tool for Teaching Question Answering","authors":"Haishuo Fang, Haritz Puerto, Iryna Gurevych","doi":"10.48550/arXiv.2305.19748","DOIUrl":"https://doi.org/10.48550/arXiv.2305.19748","url":null,"abstract":"The exponential growth of question answering (QA) has made it an indispensable topic in any Natural Language Processing (NLP) course. Additionally, the breadth of QA derived from this exponential growth makes it an ideal scenario for teaching related NLP topics such as information retrieval, explainability, and adversarial attacks among others. In this paper, we introduce UKP-SQuARE as a platform for QA education. This platform provides an interactive environment where students can run, compare, and analyze various QA models from different perspectives, such as general behavior, explainability, and robustness. Therefore, students can get a first-hand experience in different QA techniques during the class. Thanks to this, we propose a learner-centered approach for QA education in which students proactively learn theoretical concepts and acquire problem-solving skills through interactive exploration, experimentation, and practical assignments, rather than solely relying on traditional lectures. To evaluate the effectiveness of UKP-SQuARE in teaching scenarios, we adopted it in a postgraduate NLP course and surveyed the students after the course. Their positive feedback shows the platform’s effectiveness in their course and invites a wider adoption.","PeriodicalId":363390,"journal":{"name":"Workshop on Innovative Use of NLP for Building Educational Applications","volume":"78 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126229352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LFTK: Handcrafted Features in Computational Linguistics 计算语言学中的手工特征

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2023-05-25 DOI: 10.48550/arXiv.2305.15878

Bruce W. Lee, J. Lee

引用次数: 3

Interpreting Neural CWI Classifiers’ Weights as Vocabulary Size 解释神经CWI分类器的权重作为词汇量

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.bea-1.17

Yo Ehara

引用次数: 4

Complementary Systems for Off-Topic Spoken Response Detection 离题口语反应检测的补充系统

Workshop on Innovative Use of NLP for Building Educational Applications Pub Date : 2020-07-01 DOI: 10.18653/v1/2020.bea-1.4

V. Raina, M. Gales, K. Knill

{"title":"Complementary Systems for Off-Topic Spoken Response Detection","authors":"V. Raina, M. Gales, K. Knill","doi":"10.18653/v1/2020.bea-1.4","DOIUrl":"https://doi.org/10.18653/v1/2020.bea-1.4","url":null,"abstract":"Increased demand to learn English for business and education has led to growing interest in automatic spoken language assessment and teaching systems. With this shift to automated approaches it is important that systems reliably assess all aspects of a candidate’s responses. This paper examines one form of spoken language assessment; whether the response from the candidate is relevant to the prompt provided. This will be referred to as off-topic spoken response detection. Two forms of previously proposed approaches are examined in this work: the hierarchical attention-based topic model (HATM); and the similarity grid model (SGM). The work focuses on the scenario when the prompt, and associated responses, have not been seen in the training data, enabling the system to be applied to new test scripts without the need to collect data or retrain the model. To improve the performance of the systems for unseen prompts, data augmentation based on easy data augmentation (EDA) and translation based approaches are applied. Additionally for the HATM, a form of prompt dropout is described. The systems were evaluated on both seen and unseen prompts from Linguaskill Business and General English tests. For unseen data the performance of the HATM was improved using data augmentation, in contrast to the SGM where no gains were obtained. The two approaches were found to be complementary to one another, yielding a combined F0.5 score of 0.814 for off-topic response detection where the prompts have not been seen in training.","PeriodicalId":363390,"journal":{"name":"Workshop on Innovative Use of NLP for Building Educational Applications","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129156514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4