FVEL：通过定理证明建立大型语言模型的交互式形式化验证环境

arXiv - CS - Mathematical Software Pub Date : 2024-06-20 DOI:arxiv-2406.14408

Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang

{"title":"FVEL：通过定理证明建立大型语言模型的交互式形式化验证环境","authors":"Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang","doi":"arxiv-2406.14408","DOIUrl":null,"url":null,"abstract":"Formal verification (FV) has witnessed growing significance with current\nemerging program synthesis by the evolving large language models (LLMs).\nHowever, current formal verification mainly resorts to symbolic verifiers or\nhand-craft rules, resulting in limitations for extensive and flexible\nverification. On the other hand, formal languages for automated theorem\nproving, such as Isabelle, as another line of rigorous verification, are\nmaintained with comprehensive rules and theorems. In this paper, we propose\nFVEL, an interactive Formal Verification Environment with LLMs. Specifically,\nFVEL transforms a given code to be verified into Isabelle, and then conducts\nverification via neural automated theorem proving with an LLM. The joined\nparadigm leverages the rigorous yet abundant formulated and organized rules in\nIsabelle and is also convenient for introducing and adjusting cutting-edge\nLLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER\ndataset includes code dependencies and verification processes that are\nformulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646\nproof steps in total with in-depth dependencies. We benchmark FVELER in the\nFVEL environment by first fine-tuning LLMs with FVELER and then evaluating them\non Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned\nLlama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 ->\n84) more problems in SV-COMP. And the proportion of proof errors is reduced.\nProject page: https://fveler.github.io/.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"76 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving\",\"authors\":\"Xiaohan Lin, Qingxing Cao, Yinya Huang, Haiming Wang, Jianqiao Lu, Zhengying Liu, Linqi Song, Xiaodan Liang\",\"doi\":\"arxiv-2406.14408\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Formal verification (FV) has witnessed growing significance with current\\nemerging program synthesis by the evolving large language models (LLMs).\\nHowever, current formal verification mainly resorts to symbolic verifiers or\\nhand-craft rules, resulting in limitations for extensive and flexible\\nverification. On the other hand, formal languages for automated theorem\\nproving, such as Isabelle, as another line of rigorous verification, are\\nmaintained with comprehensive rules and theorems. In this paper, we propose\\nFVEL, an interactive Formal Verification Environment with LLMs. Specifically,\\nFVEL transforms a given code to be verified into Isabelle, and then conducts\\nverification via neural automated theorem proving with an LLM. The joined\\nparadigm leverages the rigorous yet abundant formulated and organized rules in\\nIsabelle and is also convenient for introducing and adjusting cutting-edge\\nLLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER\\ndataset includes code dependencies and verification processes that are\\nformulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646\\nproof steps in total with in-depth dependencies. We benchmark FVELER in the\\nFVEL environment by first fine-tuning LLMs with FVELER and then evaluating them\\non Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned\\nLlama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 ->\\n84) more problems in SV-COMP. And the proportion of proof errors is reduced.\\nProject page: https://fveler.github.io/.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"76 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.14408\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.14408","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着大型语言模型（LLM）的不断发展，形式化验证（FV）的重要性与日俱增。然而，目前的形式化验证主要依赖于符号验证器或手工创建规则，这对广泛而灵活的验证造成了限制。另一方面，用于自动定理证明的形式语言（如 Isabelle）作为严格验证的另一条线，具有全面的规则和定理。在本文中，我们提出了带有 LLMs 的交互式形式化验证环境 FVEL。具体来说，FVEL 将给定的待验证代码转换为 Isabelle，然后通过神经自动定理证明与 LLM 进行验证。这种联合范式利用了Isabelle中严谨而丰富的规则，同时也便于引入和调整最前沿的LLM。为了实现这一目标，我们提取了大规模的 FVELER3。FVELER数据集包括用Isabelle制定的代码依赖和验证过程，共包含758个理论、29,125个词条和200,646个有深度依赖的验证步骤。我们在 FVEL 环境中对 FVELER 进行了基准测试，首先用 FVELER 对 LLM 进行了微调，然后在 Code2Inv 和 SV-COMP 上对它们进行了评估。结果表明，在 SV-COMP 中，使用 FVELER 微调过的 Llama3- 8B 解决的问题比 FVEL 多 17.39% (69 -> 81)，Mistral-7B 解决的问题比 FVEL 多 12% (75 -> 84)。证明错误的比例也有所降低。项目页面：https://fveler.github.io/.

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FVEL: Interactive Formal Verification Environment with Large Language Models via Theorem Proving

Formal verification (FV) has witnessed growing significance with current emerging program synthesis by the evolving large language models (LLMs). However, current formal verification mainly resorts to symbolic verifiers or hand-craft rules, resulting in limitations for extensive and flexible verification. On the other hand, formal languages for automated theorem proving, such as Isabelle, as another line of rigorous verification, are maintained with comprehensive rules and theorems. In this paper, we propose FVEL, an interactive Formal Verification Environment with LLMs. Specifically, FVEL transforms a given code to be verified into Isabelle, and then conducts verification via neural automated theorem proving with an LLM. The joined paradigm leverages the rigorous yet abundant formulated and organized rules in Isabelle and is also convenient for introducing and adjusting cutting-edge LLMs. To achieve this goal, we extract a large-scale FVELER3. The FVELER dataset includes code dependencies and verification processes that are formulated in Isabelle, containing 758 theories, 29,125 lemmas, and 200,646 proof steps in total with in-depth dependencies. We benchmark FVELER in the FVEL environment by first fine-tuning LLMs with FVELER and then evaluating them on Code2Inv and SV-COMP. The results show that FVEL with FVELER fine-tuned Llama3- 8B solves 17.39% (69 -> 81) more problems, and Mistral-7B 12% (75 -> 84) more problems in SV-COMP. And the proportion of proof errors is reduced. Project page: https://fveler.github.io/.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Mathematical Software

自引率

0.00%

发文量