{"title":"Hallucination detection in LLM code generation: A sampling-based consensus verification approach","authors":"Taicheng Huang, Zhanhui Ren, Yuan Huang, Xiangping Chen, Yi Liu, Zibin Zheng","doi":"10.1007/s10515-026-00605-0","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Large Language Models (LLMs) have revolutionized the code generation task, but their output often contains \"hallucinations\" - code snippets that look reasonable but are actually wrong (such as API misuse or logic errors). Existing detection methods mainly rely on dynamic code execution, which requires complex runtime environment configurations. This paper proposes HalluCodeDetector, a new static analysis framework based on sampling consistency verification. The method is based on the following assumption: when LLM correctly understands the problem, its random output shows high consistency in syntactic structure, data flow, and API usage patterns. The process of the method is as follows: for a given problem, we let LLM repeatedly generate multiple code samples and evaluate their semantic/functional consistency, a new metric (MRCM) is used to calculate the average similarity between candidate response and other samples to quantify the possibility of hallucination. Experiments on HumanEval+ and MBPP benchmarks demonstrate that HalluCodeDetector achieves AUROC=0.76, outperforming baseline methods like LYNX by 15.2%, and with lower time overhead. Our method provides a secure, efficient, and generalizable solution for improving the reliability of LLM-generated code.</p>\n </div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 2","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2026-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-026-00605-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Large Language Models (LLMs) have revolutionized the code generation task, but their output often contains "hallucinations" - code snippets that look reasonable but are actually wrong (such as API misuse or logic errors). Existing detection methods mainly rely on dynamic code execution, which requires complex runtime environment configurations. This paper proposes HalluCodeDetector, a new static analysis framework based on sampling consistency verification. The method is based on the following assumption: when LLM correctly understands the problem, its random output shows high consistency in syntactic structure, data flow, and API usage patterns. The process of the method is as follows: for a given problem, we let LLM repeatedly generate multiple code samples and evaluate their semantic/functional consistency, a new metric (MRCM) is used to calculate the average similarity between candidate response and other samples to quantify the possibility of hallucination. Experiments on HumanEval+ and MBPP benchmarks demonstrate that HalluCodeDetector achieves AUROC=0.76, outperforming baseline methods like LYNX by 15.2%, and with lower time overhead. Our method provides a secure, efficient, and generalizable solution for improving the reliability of LLM-generated code.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.