CQuAE: A new Contextualized QUestion Answering corpus on Education domain

IF 2.7 3区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Data & Knowledge Engineering Pub Date : 2024-04-15 DOI:10.1016/j.datak.2024.102305

Thomas Gerald , Louis Tamames , Sofiane Ettayeb , Ha-Quang Le , Patrick Paroubek , Anne Vilnat

{"title":"CQuAE: A new Contextualized QUestion Answering corpus on Education domain","authors":"Thomas Gerald , Louis Tamames , Sofiane Ettayeb , Ha-Quang Le , Patrick Paroubek , Anne Vilnat","doi":"10.1016/j.datak.2024.102305","DOIUrl":null,"url":null,"abstract":"<div><p>Generating education-related questions and answers remains an open issue while being useful for students, teachers, and teaching aids. Given textual course material, we are interested in generating non-factual questions that require an elaborate answer (relying on analysis or reasoning). Despite the availability of annotated corpora of questions and answers, the effort to develop a generator using deep learning faces two main challenges. Firstly, freely accessible and qualitative data are insufficient to train generative approaches. Secondly, for a stand-alone application, we do not have explicit support to guide the generation toward complex questions. To tackle the first issue, we propose a new corpus based on education documents. For the second point, we propose to study several retargetable language algorithms to produce answers by extracting text spans from contextual documents to help the generation of questions. We particularly study the contribution of deep neural syntactic parsing and transformer-based semantic representation, taking into account the question type (according to our specific question typology) and the contextual support text span. Additionally, recent advances in generation models have proven the efficiency of the instruction-based approach for natural language generation. Consequently, we propose a first investigation of very large language models to generate questions related to the education domain.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102305"},"PeriodicalIF":2.7000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24000296","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Generating education-related questions and answers remains an open issue while being useful for students, teachers, and teaching aids. Given textual course material, we are interested in generating non-factual questions that require an elaborate answer (relying on analysis or reasoning). Despite the availability of annotated corpora of questions and answers, the effort to develop a generator using deep learning faces two main challenges. Firstly, freely accessible and qualitative data are insufficient to train generative approaches. Secondly, for a stand-alone application, we do not have explicit support to guide the generation toward complex questions. To tackle the first issue, we propose a new corpus based on education documents. For the second point, we propose to study several retargetable language algorithms to produce answers by extracting text spans from contextual documents to help the generation of questions. We particularly study the contribution of deep neural syntactic parsing and transformer-based semantic representation, taking into account the question type (according to our specific question typology) and the contextual support text span. Additionally, recent advances in generation models have proven the efficiency of the instruction-based approach for natural language generation. Consequently, we propose a first investigation of very large language models to generate questions related to the education domain.

查看原文本刊更多论文

CQuAE：新的教育领域语境化问题解答语料库

生成与教育相关的问题和答案，对学生、教师和教学辅助工具有用，但仍是一个未决问题。对于文本课程材料，我们感兴趣的是生成需要详细回答（依靠分析或推理）的非事实性问题。尽管有问题和答案的注释语料库，但利用深度学习开发生成器的工作面临两大挑战。首先，免费获取的定性数据不足以训练生成方法。其次，对于独立应用来说，我们没有明确的支持来引导生成复杂问题。为了解决第一个问题，我们提出了一个基于教育文件的新语料库。针对第二个问题，我们建议研究几种可重新定位的语言算法，通过从上下文文档中提取文本跨度来生成答案，从而帮助生成问题。考虑到问题类型（根据我们特定的问题类型学）和上下文支持文本跨度，我们特别研究了深度神经句法分析和基于转换器的语义表示的贡献。此外，生成模型的最新进展证明了基于指令的自然语言生成方法的效率。因此，我们建议对超大型语言模型进行首次研究，以生成与教育领域相关的问题。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data & Knowledge Engineering 工程技术-计算机：人工智能

CiteScore

5.00

自引率

0.00%

发文量

审稿时长

6 months

期刊介绍： Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.