CORWA: A Citation-Oriented Related Work Annotation Dataset

Xiangci Li, Biswadip Mandal, Jessica Ouyang
{"title":"CORWA: A Citation-Oriented Related Work Annotation Dataset","authors":"Xiangci Li, Biswadip Mandal, Jessica Ouyang","doi":"10.48550/arXiv.2205.03512","DOIUrl":null,"url":null,"abstract":"Academic research is an exploratory activity to discover new solutions to problems. By this nature, academic research works perform literature reviews to distinguish their novelties from prior work. In natural language processing, this literature review is usually conducted under the “Related Work” section. The task of related work generation aims to automatically generate the related work section given the rest of the research paper and a list of papers to cite. Prior work on this task has focused on the sentence as the basic unit of generation, neglecting the fact that related work sections consist of variable length text fragments derived from different information sources. As a first step toward a linguistically-motivated related work generation framework, we present a Citation Oriented Related Work Annotation (CORWA) dataset that labels different types of citation text fragments from different information sources. We train a strong baseline model that automatically tags the CORWA labels on massive unlabeled related work section texts. We further suggest a novel framework for human-in-the-loop, iterative, abstractive related work generation.","PeriodicalId":382084,"journal":{"name":"North American Chapter of the Association for Computational Linguistics","volume":"118 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"North American Chapter of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2205.03512","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Academic research is an exploratory activity to discover new solutions to problems. By this nature, academic research works perform literature reviews to distinguish their novelties from prior work. In natural language processing, this literature review is usually conducted under the “Related Work” section. The task of related work generation aims to automatically generate the related work section given the rest of the research paper and a list of papers to cite. Prior work on this task has focused on the sentence as the basic unit of generation, neglecting the fact that related work sections consist of variable length text fragments derived from different information sources. As a first step toward a linguistically-motivated related work generation framework, we present a Citation Oriented Related Work Annotation (CORWA) dataset that labels different types of citation text fragments from different information sources. We train a strong baseline model that automatically tags the CORWA labels on massive unlabeled related work section texts. We further suggest a novel framework for human-in-the-loop, iterative, abstractive related work generation.
CORWA:一个面向引用的相关工作注释数据集
学术研究是发现问题新解决方案的探索性活动。由于这种性质,学术研究工作进行文献综述,以区分其新颖性与先前的工作。在自然语言处理中,这种文献综述通常在“相关工作”部分进行。相关工作生成任务的目的是在给定研究论文的其余部分和要引用的论文列表的情况下自动生成相关工作部分。在此之前的工作主要集中在作为基本生成单位的句子上,而忽略了相关工作部分由来自不同信息源的可变长度文本片段组成的事实。作为语言驱动的相关工作生成框架的第一步,我们提出了一个面向引文的相关工作注释(CORWA)数据集,该数据集标记了来自不同信息源的不同类型的引文文本片段。我们训练了一个强大的基线模型,该模型可以在大量未标记的相关工作部分文本上自动标记CORWA标签。我们进一步提出了一个新的框架,用于人在循环,迭代,抽象相关的工作生成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信