Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

Seungpil Lee, Woochang Sim, Donghyeon Shin, Sanha Hwang, Wongyu Seo, Jiwon Park, Seokki Lee, Sejin Kim, Sundong Kim
{"title":"Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus","authors":"Seungpil Lee, Woochang Sim, Donghyeon Shin, Sanha Hwang, Wongyu Seo, Jiwon Park, Seokki Lee, Sejin Kim, Sundong Kim","doi":"arxiv-2403.11793","DOIUrl":null,"url":null,"abstract":"The existing methods for evaluating the inference abilities of Large Language\nModels (LLMs) have been results-centric, making it difficult to assess the\ninference process. We introduce a new approach using the Abstract and Reasoning\nCorpus (ARC) dataset to evaluate the inference and contextual understanding\nabilities of large language models in a process-centric manner. ARC demands\nrigorous logical structures for problem-solving, making it a benchmark that\nfacilitates the comparison of model inference abilities with humans.\nExperimental results confirm that while large language models possess weak\ninference abilities, they still lag in terms of logical coherence,\ncompositionality, and productivity. Our experiments highlight the reasoning\ncapabilities of LLMs, proposing development paths for achieving human-level\nreasoning.","PeriodicalId":501033,"journal":{"name":"arXiv - CS - Symbolic Computation","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Symbolic Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2403.11793","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been results-centric, making it difficult to assess the inference process. We introduce a new approach using the Abstract and Reasoning Corpus (ARC) dataset to evaluate the inference and contextual understanding abilities of large language models in a process-centric manner. ARC demands rigorous logical structures for problem-solving, making it a benchmark that facilitates the comparison of model inference abilities with humans. Experimental results confirm that while large language models possess weak inference abilities, they still lag in terms of logical coherence, compositionality, and productivity. Our experiments highlight the reasoning capabilities of LLMs, proposing development paths for achieving human-level reasoning.
大型语言模型的推理能力:对抽象与推理语料库的深入分析
现有的评估大型语言模型(LLM)推理能力的方法都是以结果为中心,很难评估推理过程。我们引入了一种新方法,利用抽象与推理语料库(Abstract and ReasoningCorpus,ARC)数据集,以过程为中心来评估大型语言模型的推理能力和上下文理解能力。实验结果证实,虽然大型语言模型具有较弱的推理能力,但它们在逻辑一致性、组成性和生产率方面仍然落后。我们的实验凸显了 LLM 的推理能力,为实现人类水平的推理提出了发展路径。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信