CQuAE: A new Contextualized QUestion Answering corpus on Education domain

IF 2.7 3区 计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Thomas Gerald , Louis Tamames , Sofiane Ettayeb , Ha-Quang Le , Patrick Paroubek , Anne Vilnat
{"title":"CQuAE: A new Contextualized QUestion Answering corpus on Education domain","authors":"Thomas Gerald ,&nbsp;Louis Tamames ,&nbsp;Sofiane Ettayeb ,&nbsp;Ha-Quang Le ,&nbsp;Patrick Paroubek ,&nbsp;Anne Vilnat","doi":"10.1016/j.datak.2024.102305","DOIUrl":null,"url":null,"abstract":"<div><p>Generating education-related questions and answers remains an open issue while being useful for students, teachers, and teaching aids. Given textual course material, we are interested in generating non-factual questions that require an elaborate answer (relying on analysis or reasoning). Despite the availability of annotated corpora of questions and answers, the effort to develop a generator using deep learning faces two main challenges. Firstly, freely accessible and qualitative data are insufficient to train generative approaches. Secondly, for a stand-alone application, we do not have explicit support to guide the generation toward complex questions. To tackle the first issue, we propose a new corpus based on education documents. For the second point, we propose to study several retargetable language algorithms to produce answers by extracting text spans from contextual documents to help the generation of questions. We particularly study the contribution of deep neural syntactic parsing and transformer-based semantic representation, taking into account the question type (according to our specific question typology) and the contextual support text span. Additionally, recent advances in generation models have proven the efficiency of the instruction-based approach for natural language generation. Consequently, we propose a first investigation of very large language models to generate questions related to the education domain.</p></div>","PeriodicalId":55184,"journal":{"name":"Data & Knowledge Engineering","volume":"151 ","pages":"Article 102305"},"PeriodicalIF":2.7000,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data & Knowledge Engineering","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169023X24000296","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Generating education-related questions and answers remains an open issue while being useful for students, teachers, and teaching aids. Given textual course material, we are interested in generating non-factual questions that require an elaborate answer (relying on analysis or reasoning). Despite the availability of annotated corpora of questions and answers, the effort to develop a generator using deep learning faces two main challenges. Firstly, freely accessible and qualitative data are insufficient to train generative approaches. Secondly, for a stand-alone application, we do not have explicit support to guide the generation toward complex questions. To tackle the first issue, we propose a new corpus based on education documents. For the second point, we propose to study several retargetable language algorithms to produce answers by extracting text spans from contextual documents to help the generation of questions. We particularly study the contribution of deep neural syntactic parsing and transformer-based semantic representation, taking into account the question type (according to our specific question typology) and the contextual support text span. Additionally, recent advances in generation models have proven the efficiency of the instruction-based approach for natural language generation. Consequently, we propose a first investigation of very large language models to generate questions related to the education domain.

CQuAE:新的教育领域语境化问题解答语料库
生成与教育相关的问题和答案,对学生、教师和教学辅助工具有用,但仍是一个未决问题。对于文本课程材料,我们感兴趣的是生成需要详细回答(依靠分析或推理)的非事实性问题。尽管有问题和答案的注释语料库,但利用深度学习开发生成器的工作面临两大挑战。首先,免费获取的定性数据不足以训练生成方法。其次,对于独立应用来说,我们没有明确的支持来引导生成复杂问题。为了解决第一个问题,我们提出了一个基于教育文件的新语料库。针对第二个问题,我们建议研究几种可重新定位的语言算法,通过从上下文文档中提取文本跨度来生成答案,从而帮助生成问题。考虑到问题类型(根据我们特定的问题类型学)和上下文支持文本跨度,我们特别研究了深度神经句法分析和基于转换器的语义表示的贡献。此外,生成模型的最新进展证明了基于指令的自然语言生成方法的效率。因此,我们建议对超大型语言模型进行首次研究,以生成与教育领域相关的问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Data & Knowledge Engineering
Data & Knowledge Engineering 工程技术-计算机:人工智能
CiteScore
5.00
自引率
0.00%
发文量
66
审稿时长
6 months
期刊介绍: Data & Knowledge Engineering (DKE) stimulates the exchange of ideas and interaction between these two related fields of interest. DKE reaches a world-wide audience of researchers, designers, managers and users. The major aim of the journal is to identify, investigate and analyze the underlying principles in the design and effective use of these systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信