Towards Benchmarking and Improving the Temporal Reasoning Capability of Large Language Models

Annual Meeting of the Association for Computational Linguistics Pub Date : 2023-06-15 DOI:10.48550/arXiv.2306.08952

Qingyu Tan, H. Ng, Lidong Bing

引用次数: 6

Abstract

Reasoning about time is of fundamental importance. Many facts are time-dependent. For example, athletes change teams from time to time, and different government officials are elected periodically. Previous time-dependent question answering (QA) datasets tend to be biased in either their coverage of time spans or question types. In this paper, we introduce a comprehensive probing dataset TempReason to evaluate the temporal reasoning capability of large language models. Our dataset includes questions of three temporal reasoning levels. In addition, we also propose a novel learning framework to improve the temporal reasoning capability of large language models, based on temporal span extraction and time-sensitive reinforcement learning. We conducted experiments in closed book QA, open book QA, and reasoning QA settings and demonstrated the effectiveness of our approach.

查看原文本刊更多论文

大型语言模型时间推理能力的标杆化与改进

关于时间的推理是至关重要的。许多事实都与时间有关。例如，运动员不时地换队，定期选举不同的政府官员。以前的时间相关问答(QA)数据集往往在时间跨度的覆盖范围或问题类型上有偏见。在本文中，我们引入了一个全面的探测数据集TempReason来评估大型语言模型的时间推理能力。我们的数据集包括三个时间推理水平的问题。此外，我们还提出了一种基于时间跨度提取和时间敏感强化学习的新型学习框架，以提高大型语言模型的时间推理能力。我们在闭卷QA、开卷QA和推理QA设置中进行了实验，并证明了我们方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annual Meeting of the Association for Computational Linguistics

自引率

0.00%

发文量