PutnamBench: Evaluating Neural Theorem-Provers on the Putnam Mathematical Competition

arXiv - CS - Programming Languages Pub Date : 2024-07-15 DOI:arxiv-2407.11214

George Tsoukalas, Jasper Lee, John Jennings, Jimmy Xin, Michelle Ding, Michael Jennings, Amitayush Thakur, Swarat Chaudhuri

引用次数: 0

Abstract

We present PutnamBench, a new multilingual benchmark for evaluating the ability of neural theorem-provers to solve competition mathematics problems. PutnamBench consists of 1697 hand-constructed formalizations of 640 theorems sourced from the William Lowell Putnam Mathematical Competition, the premier undergraduate-level mathematics competition in North America. All the theorems have formalizations in Lean 4 and Isabelle; a substantial subset also has Coq formalizations. Proving the theorems requires significant problem-solving ability and proficiency in a broad range of topics taught in undergraduate mathematics courses. We use PutnamBench to evaluate several established neural and symbolic theorem-provers. These approaches can only solve a handful of the PutnamBench problems, establishing the benchmark as a difficult open challenge for research on neural theorem-proving. PutnamBench is available at https://github.com/trishullab/PutnamBench.

查看原文本刊更多论文

普特南平台：在普特南数学竞赛中评估神经定理求解器

PutnamBench 是一个新的多语言基准，用于评估神经定理证明器解决竞赛数学问题的能力。PutnamBench 由 1697 个手工构建的 640 个定理的形式化组成，这些定理来自威廉-洛厄尔-普特南数学竞赛（William Lowell Putnam Mathematical Competition），该竞赛是北美首屈一指的本科生级数学竞赛。所有定理都在 Lean 4 和 Isabelle 中进行了形式化；相当一部分还在 Coq 中进行了形式化。要证明这些定理，必须具备很强的问题解决能力，并熟练掌握本科数学课程中教授的各种主题。我们使用 PutnamBench 评估了几种成熟的神经和符号定理证明器。这些方法只能解决少数几个PutnamBench问题，从而使该基准成为神经定理求解研究的一项艰巨的公开挑战。PutnamBench可在https://github.com/trishullab/PutnamBench。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Programming Languages

自引率

0.00%

发文量