FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang
{"title":"FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving","authors":"Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang","doi":"arxiv-2406.14408","DOIUrl":null,"url":null,"abstract":"Formal verification (FV) has witnessed growing significance with current\nemerging program synthesis by the evolving large language models (LLMs).\nHowever, current formal verification mainly resorts to symbolic verifiers or\nhand-craft rules, resulting in limitations for extensive and flexible\nverification. On the other hand, formal languages for automated theorem\nproving, such as Isabelle, as another line of rigorous verification, are\nmaintained with comprehensive rules and theorems. In this paper, we propose\nFVEL, an interactive Formal Verification Environment with LLMs. Specifically,\nFVEL transforms a given code to be verified into Isabelle, and then conducts\nverification via neural automated theorem proving with an LLM. The joined\nparadigm leverages the rigorous yet abundant formulated and organized rules in\nIsabelle and is also convenient for introducing and adjusting cutting-edge\nLLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER\ndataset includes code dependencies and verification processes that are\nformulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646\nproof steps in total with in-depth dependencies. We benchmark FVELER in the\nFVEL environment by first fine-tuning LLMs with FVELER and then evaluating them\non Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned\nLlama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 ->\n84) more problems in SV-COMP. And the proportion of proof errors is reduced.\nProject page: https://fveler.github.io/.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.14408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as another line of rigorous verification, are maintained with comprehensive rules and theorems. In this paper, we propose FVEL, an interactive Formal Verification Environment with LLMs. Specifically, FVEL transforms a given code to be verified into Isabelle, and then conducts verification via neural automated theorem proving with an LLM. The joined paradigm leverages the rigorous yet abundant formulated and organized rules in Isabelle and is also convenient for introducing and adjusting cutting-edge LLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER dataset includes code dependencies and verification processes that are formulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646 proof steps in total with in-depth dependencies. We benchmark FVELER in the FVEL environment by first fine-tuning LLMs with FVELER and then evaluating them on Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned Llama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 -> 84) more problems in SV-COMP. And the proportion of proof errors is reduced. Project page: https://fveler.github.io/.
FVEL:通过定理证明建立大型语言模型的交互式形式化验证环境
随着大型语言模型(LLM)的不断发展,形式化验证(FV)的重要性与日俱增。然而,目前的形式化验证主要依赖于符号验证器或手工创建规则,这对广泛而灵活的验证造成了限制。另一方面,用于自动定理证明的形式语言(如 Isabelle)作为严格验证的另一条线,具有全面的规则和定理。在本文中,我们提出了带有 LLMs 的交互式形式化验证环境 FVEL。具体来说,FVEL 将给定的待验证代码转换为 Isabelle,然后通过神经自动定理证明与 LLM 进行验证。这种联合范式利用了Isabelle中严谨而丰富的规则,同时也便于引入和调整最前沿的LLM。为了实现这一目标,我们提取了大规模的 FVELER3。FVELER数据集包括用Isabelle制定的代码依赖和验证过程,共包含758个理论、29,125个词条和200,646个有深度依赖的验证步骤。我们在 FVEL 环境中对 FVELER 进行了基准测试,首先用 FVELER 对 LLM 进行了微调,然后在 Code2Inv 和 SV-COMP 上对它们进行了评估。结果表明,在 SV-COMP 中,使用 FVELER 微调过的 Llama3- 8B 解决的问题比 FVEL 多 17.39% (69 -> 81),Mistral-7B 解决的问题比 FVEL 多 12% (75 -> 84)。证明错误的比例也有所降低。项目页面:https://fveler.github.io/.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信