Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

arXiv - CS - Symbolic Computation Pub Date : 2024-03-18 DOI:arxiv-2403.11793

Seungpil Lee, Woochang Sim, Donghyeon Shin, Sanha Hwang, Wongyu Seo, Jiwon Park, Seokki Lee, Sejin Kim, Sundong Kim

引用次数: 0

Abstract

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning.

查看原文本刊更多论文

大型语言模型的推理能力：对抽象与推理语料库的深入分析

现有的评估大型语言模型（LLM）推理能力的方法都是以结果为中心，很难评估推理过程。我们引入了一种新方法，利用抽象与推理语料库（Abstract and ReasoningCorpus，ARC）数据集，以过程为中心来评估大型语言模型的推理能力和上下文理解能力。实验结果证实，虽然大型语言模型具有较弱的推理能力，但它们在逻辑一致性、组成性和生产率方面仍然落后。我们的实验凸显了 LLM 的推理能力，为实现人类水平的推理提出了发展路径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Symbolic Computation

自引率

0.00%

发文量