KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding

Zijun Yao, Yan-Tie Liu, Xin Lv, S. Cao, Jifan Yu, Lei Hou, Juanzi Li
{"title":"KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding","authors":"Zijun Yao, Yan-Tie Liu, Xin Lv, S. Cao, Jifan Yu, Lei Hou, Juanzi Li","doi":"10.48550/arXiv.2307.03115","DOIUrl":null,"url":null,"abstract":"Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as the final answers. We test state-of-the-art models on KoRC and the experimental results show that the strongest baseline only achieves 68.3% and 30.0% F1 measure in the in-distribution and out-of-distribution test set, respectively. These results indicate that deep text understanding is still an unsolved challenge. The benchmark dataset, leaderboard, and baseline methods are released in https://github.com/THU-KEG/KoRC.","PeriodicalId":352845,"journal":{"name":"Annual Meeting of the Association for Computational Linguistics","volume":"23 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-07-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annual Meeting of the Association for Computational Linguistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2307.03115","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as the final answers. We test state-of-the-art models on KoRC and the experimental results show that the strongest baseline only achieves 68.3% and 30.0% F1 measure in the in-distribution and out-of-distribution test set, respectively. These results indicate that deep text understanding is still an unsolved challenge. The benchmark dataset, leaderboard, and baseline methods are released in https://github.com/THU-KEG/KoRC.
面向知识的深度文本理解阅读理解基准
深度文本理解需要给定文档和文本之外的先验知识之间的联系,近年来已被许多基准所强调。然而,这些基准测试遇到了两个主要的限制。一方面,它们大多需要人类对知识进行注释,这导致知识覆盖范围有限。另一方面,他们通常使用文本中的选择或跨度作为答案,这导致答案空间狭窄。为了克服这些限制,我们在本文中构建了一个名为KoRc的新的具有挑战性的基准。与以前的基准测试相比,KoRC有两个优势,即知识覆盖面广,答案格式灵活。具体来说,我们利用大量的知识库来指导注释者或大型语言模型(llm)构建知识问题。此外,我们在知识库中使用标签而不是跨度或选择作为最终答案。我们在KoRC上测试了最先进的模型,实验结果表明,最强基线在分布内和分布外测试集中分别只达到68.3%和30.0%的F1度量。这些结果表明,深度文本理解仍然是一个未解决的挑战。基准数据集、排行榜和基线方法发布在https://github.com/THU-KEG/KoRC。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信