RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation

arXiv - CS - Software Engineering Pub Date : 2024-09-15 DOI:arxiv-2409.09584

Qingyao Li, Wei Xia, Kounianhua Du, Xinyi Dai, Ruiming Tang, Yasheng Wang, Yong Yu, Weinan Zhang

{"title":"RethinkMCTS: Refining Erroneous Thoughts in Monte Carlo Tree Search for Code Generation","authors":"Qingyao Li, Wei Xia, Kounianhua Du, Xinyi Dai, Ruiming Tang, Yasheng Wang, Yong Yu, Weinan Zhang","doi":"arxiv-2409.09584","DOIUrl":null,"url":null,"abstract":"LLM agents enhanced by tree search algorithms have yielded notable\nperformances in code generation. However, current search algorithms in this\ndomain suffer from low search quality due to several reasons: 1) Ineffective\ndesign of the search space for the high-reasoning demands of code generation\ntasks, 2) Inadequate integration of code feedback with the search algorithm,\nand 3) Poor handling of negative feedback during the search, leading to reduced\nsearch efficiency and quality. To address these challenges, we propose to\nsearch for the reasoning process of the code and use the detailed feedback of\ncode execution to refine erroneous thoughts during the search. In this paper,\nwe introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS)\nalgorithm to conduct thought-level searches before generating code, thereby\nexploring a wider range of strategies. More importantly, we construct verbal\nfeedback from fine-grained code execution feedback to refine erroneous thoughts\nduring the search. This ensures that the search progresses along the correct\nreasoning paths, thus improving the overall search quality of the tree by\nleveraging execution feedback. Through extensive experiments, we demonstrate\nthat RethinkMCTS outperforms previous search-based and feedback-based code\ngeneration baselines. On the HumanEval dataset, it improves the pass@1 of\nGPT-3.5-turbo from 70.12 to 89.02 and GPT-4o-mini from 87.20 to 94.51. It\neffectively conducts more thorough exploration through thought-level searches\nand enhances the search quality of the entire tree by incorporating rethink\noperation.","PeriodicalId":501278,"journal":{"name":"arXiv - CS - Software Engineering","volume":"211 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Software Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09584","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

LLM agents enhanced by tree search algorithms have yielded notable performances in code generation. However, current search algorithms in this domain suffer from low search quality due to several reasons: 1) Ineffective design of the search space for the high-reasoning demands of code generation tasks, 2) Inadequate integration of code feedback with the search algorithm, and 3) Poor handling of negative feedback during the search, leading to reduced search efficiency and quality. To address these challenges, we propose to search for the reasoning process of the code and use the detailed feedback of code execution to refine erroneous thoughts during the search. In this paper, we introduce RethinkMCTS, which employs the Monte Carlo Tree Search (MCTS) algorithm to conduct thought-level searches before generating code, thereby exploring a wider range of strategies. More importantly, we construct verbal feedback from fine-grained code execution feedback to refine erroneous thoughts during the search. This ensures that the search progresses along the correct reasoning paths, thus improving the overall search quality of the tree by leveraging execution feedback. Through extensive experiments, we demonstrate that RethinkMCTS outperforms previous search-based and feedback-based code generation baselines. On the HumanEval dataset, it improves the pass@1 of GPT-3.5-turbo from 70.12 to 89.02 and GPT-4o-mini from 87.20 to 94.51. It effectively conducts more thorough exploration through thought-level searches and enhances the search quality of the entire tree by incorporating rethink operation.

查看原文本刊更多论文

RethinkMCTS：改进蒙特卡洛树搜索代码生成中的错误想法

通过树搜索算法增强的 LLM 代理在代码生成方面取得了显著的性能。然而，由于以下几个原因，目前该领域的搜索算法存在搜索质量低的问题：1）针对代码生成任务的高推理要求而设计的搜索空间效果不佳；2）代码反馈与搜索算法的整合不充分；3）搜索过程中对负面反馈的处理不当，导致搜索效率和质量下降。为了应对这些挑战，我们建议对代码的推理过程进行搜索，并在搜索过程中利用代码执行的详细反馈来完善错误的想法。在本文中，我们介绍了 RethinkMCTS，它采用蒙特卡洛树搜索（Monte Carlo Tree Search，MCTS）算法，在生成代码前进行思想层面的搜索，从而探索更广泛的策略。更重要的是，我们从细粒度的代码执行反馈中构建了口头反馈，以便在搜索过程中改进错误的想法。这确保了搜索沿着正确的推理路径进行，从而通过利用执行反馈提高了树的整体搜索质量。通过大量实验，我们证明 RethinkMCTS 的性能优于之前基于搜索和反馈的代码生成基线。在 HumanEval 数据集上，它将 GPT-3.5-turbo 的 pass@1 从 70.12 提高到 89.02，将 GPT-4o-mini 的 pass@1 从 87.20 提高到 94.51。它通过思考级搜索有效地进行了更彻底的探索，并通过结合重新思考操作提高了整个树的搜索质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Software Engineering

自引率

0.00%

发文量