GPTVD: vulnerability detection and analysis method based on LLM’s chain of thoughts

IF 3.1 2区计算机科学 Q3 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Automated Software Engineering Pub Date : 2025-09-09 DOI:10.1007/s10515-025-00550-4

Yinan Chen, Yuan Huang, Xiangping Chen, Pengfei Shen, Lei Yun

{"title":"GPTVD: vulnerability detection and analysis method based on LLM’s chain of thoughts","authors":"Yinan Chen, Yuan Huang, Xiangping Chen, Pengfei Shen, Lei Yun","doi":"10.1007/s10515-025-00550-4","DOIUrl":null,"url":null,"abstract":"<div><p>Traditional vulnerability detection methods based on rules or learning primarily focus on coarse-grained predictions, often lacking precise localization and interpretability regarding the root causes of vulnerabilities. The growing availability of open-source vulnerability databases calls for advanced methods that can reason about vulnerabilities at a finer slice-level granularity. GPTVD, which leverages large language models’ (LLMs) in-context learning (ICL) and chain-of-thought (COT) reasoning capabilities. The goal is to enhance both detection performance and explainability. GPTVD extracts threat code slices through static code analysis, focusing on data and control dependencies. Positive and negative samples are clustered based on heuristic features and semantic feature vectors, and representative samples are manually annotated with reasoning processes to build COT prompts. These prompts are combined with target samples to form LLM input queries, enabling slice-level vulnerability inference and explanation using LLM. The method was evaluated on 18,062 programs from a public dataset. GPTVD achieved superior performance compared to existing methods, with 92.21% accuracy, 93.20% precision, and 92.28% recall. Ablation studies confirm that clustering-based prompt selection, explicit threat code slices, and human expert reasoning significantly improve detection effectiveness and interpretability. GPTVD demonstrates that combining static code analysis with LLM-based COT reasoning can effectively detect vulnerabilities at the slice level with high accuracy and interpretability.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"33 1","pages":""},"PeriodicalIF":3.1000,"publicationDate":"2025-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00550-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Traditional vulnerability detection methods based on rules or learning primarily focus on coarse-grained predictions, often lacking precise localization and interpretability regarding the root causes of vulnerabilities. The growing availability of open-source vulnerability databases calls for advanced methods that can reason about vulnerabilities at a finer slice-level granularity. GPTVD, which leverages large language models’ (LLMs) in-context learning (ICL) and chain-of-thought (COT) reasoning capabilities. The goal is to enhance both detection performance and explainability. GPTVD extracts threat code slices through static code analysis, focusing on data and control dependencies. Positive and negative samples are clustered based on heuristic features and semantic feature vectors, and representative samples are manually annotated with reasoning processes to build COT prompts. These prompts are combined with target samples to form LLM input queries, enabling slice-level vulnerability inference and explanation using LLM. The method was evaluated on 18,062 programs from a public dataset. GPTVD achieved superior performance compared to existing methods, with 92.21% accuracy, 93.20% precision, and 92.28% recall. Ablation studies confirm that clustering-based prompt selection, explicit threat code slices, and human expert reasoning significantly improve detection effectiveness and interpretability. GPTVD demonstrates that combining static code analysis with LLM-based COT reasoning can effectively detect vulnerabilities at the slice level with high accuracy and interpretability.

查看原文本刊更多论文

GPTVD：基于LLM思维链的漏洞检测与分析方法

传统的基于规则或学习的漏洞检测方法主要关注粗粒度的预测，往往缺乏对漏洞根源的精确定位和可解释性。开源漏洞数据库的可用性越来越高，需要能够在更细的片级粒度上分析漏洞的高级方法。GPTVD，它利用了大型语言模型（llm）的上下文学习（ICL）和思维链（COT）推理能力。目标是提高检测性能和可解释性。GPTVD通过静态代码分析提取威胁代码片，专注于数据和控制依赖关系。基于启发式特征和语义特征向量对正样本和负样本进行聚类，并通过推理过程对代表性样本进行手动注释以构建COT提示。这些提示与目标样本相结合，形成LLM输入查询，支持使用LLM进行片级漏洞推断和解释。该方法在公共数据集中的18062个程序上进行了评估。与现有方法相比，GPTVD的准确率为92.21%，精密度为93.20%，召回率为92.28%。消融研究证实，基于聚类的提示选择、明确的威胁代码切片和人类专家推理显著提高了检测效率和可解释性。GPTVD表明，将静态代码分析与基于llm的COT推理相结合，可以有效地在片级检测漏洞，具有较高的准确率和可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Automated Software Engineering 工程技术-计算机：软件工程

CiteScore

4.80

自引率

11.80%

发文量

审稿时长

>12 weeks

期刊介绍： This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes. Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.