Yinan Chen, Yuan Huang, Xiangping Chen, Pengfei Shen, Lei Yun
{"title":"GPTVD: vulnerability detection and analysis method based on LLM’s chain of thoughts","authors":"Yinan Chen, Yuan Huang, Xiangping Chen, Pengfei Shen, Lei Yun","doi":"10.1007/s10515-025-00550-4","DOIUrl":null,"url":null,"abstract":"<div><p>Traditional vulnerability detection methods based on rules or learning primarily focus on coarse-grained predictions, often lacking precise localization and interpretability regarding the root causes of vulnerabilities. The growing availability of open-source vulnerability databases calls for advanced methods that can reason about vulnerabilities at a finer slice-level granularity. GPTVD, which leverages large language models’ (LLMs) in-context learning (ICL) and chain-of-thought (COT) reasoning capabilities. The goal is to enhance both detection performance and explainability. GPTVD extracts threat code slices through static code analysis, focusing on data and control dependencies. Positive and negative samples are clustered based on heuristic features and semantic feature vectors, and representative samples are manually annotated with reasoning processes to build COT prompts. These prompts are combined with target samples to form LLM input queries, enabling slice-level vulnerability inference and explanation using LLM. The method was evaluated on 18,062 programs from a public dataset. GPTVD achieved superior performance compared to existing methods, with 92.21% accuracy, 93.20% precision, and 92.28% recall. Ablation studies confirm that clustering-based prompt selection, explicit threat code slices, and human expert reasoning significantly improve detection effectiveness and interpretability. GPTVD demonstrates that combining static code analysis with LLM-based COT reasoning can effectively detect vulnerabilities at the slice level with high accuracy and interpretability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00550-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
Traditional vulnerability detection methods based on rules or learning primarily focus on coarse-grained predictions, often lacking precise localization and interpretability regarding the root causes of vulnerabilities. The growing availability of open-source vulnerability databases calls for advanced methods that can reason about vulnerabilities at a finer slice-level granularity. GPTVD, which leverages large language models’ (LLMs) in-context learning (ICL) and chain-of-thought (COT) reasoning capabilities. The goal is to enhance both detection performance and explainability. GPTVD extracts threat code slices through static code analysis, focusing on data and control dependencies. Positive and negative samples are clustered based on heuristic features and semantic feature vectors, and representative samples are manually annotated with reasoning processes to build COT prompts. These prompts are combined with target samples to form LLM input queries, enabling slice-level vulnerability inference and explanation using LLM. The method was evaluated on 18,062 programs from a public dataset. GPTVD achieved superior performance compared to existing methods, with 92.21% accuracy, 93.20% precision, and 92.28% recall. Ablation studies confirm that clustering-based prompt selection, explicit threat code slices, and human expert reasoning significantly improve detection effectiveness and interpretability. GPTVD demonstrates that combining static code analysis with LLM-based COT reasoning can effectively detect vulnerabilities at the slice level with high accuracy and interpretability.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.