CodeSpeak: Improving smart contract vulnerability detection via LLM-assisted code analysis

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-09-17 DOI:10.1016/j.jss.2025.112635

Shuyu Chang , Chen Geng , Haiping Huang , Rui Wang , Qi Li , Yang Zhang

{"title":"CodeSpeak: Improving smart contract vulnerability detection via LLM-assisted code analysis","authors":"Shuyu Chang , Chen Geng , Haiping Huang , Rui Wang , Qi Li , Yang Zhang","doi":"10.1016/j.jss.2025.112635","DOIUrl":null,"url":null,"abstract":"<div><div>Smart contracts play a crucial role in blockchain technology, but their security remains vulnerable to various threats. While deep learning approaches have shown promise in vulnerability detection, they often require complex graph constructions that complicate the detection process. Large language models (LLMs) offer powerful code comprehension capabilities, but their direct application to vulnerability detection often yields inconsistent or unreliable results. To address these challenges, we introduce CodeSpeak, a novel framework that enhances smart contract vulnerability detection by leveraging LLM-assisted code analysis. Our approach first eliminates redundant code statements to focus on security-critical sections. We then leverage LLMs with designed domain-specific instructions that simulate security expert auditing practices. These instructions serve as intermediate representations that bridge the gap between natural language and vulnerability patterns. CodeSpeak processes this analysis by LLMs and creates structured prompt templates with these results, which are used to train a detection model. Compared to deep learning approaches, this framework offers a more intuitive solution while maintaining high detection effectiveness. Extensive experiments conducted on four types of vulnerabilities (<em>Reentrancy</em>, <em>Timestamp</em>, <em>Overflow/Underflow</em>, and <em>Delegatecall</em>) demonstrate the effectiveness of our approach. Our framework also demonstrates strong adaptability to new vulnerability types with minimal training samples, and provides a cost-effective solution for practical deployment. Moreover, a user study with developers shows CodeSpeak reduces detection time by 98.7% compared to manual analysis while maintaining superior accuracy. These improvements highlight the potential of LLM-assisted code analysis in smart contract security assessment.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"231 ","pages":"Article 112635"},"PeriodicalIF":4.1000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225003048","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Smart contracts play a crucial role in blockchain technology, but their security remains vulnerable to various threats. While deep learning approaches have shown promise in vulnerability detection, they often require complex graph constructions that complicate the detection process. Large language models (LLMs) offer powerful code comprehension capabilities, but their direct application to vulnerability detection often yields inconsistent or unreliable results. To address these challenges, we introduce CodeSpeak, a novel framework that enhances smart contract vulnerability detection by leveraging LLM-assisted code analysis. Our approach first eliminates redundant code statements to focus on security-critical sections. We then leverage LLMs with designed domain-specific instructions that simulate security expert auditing practices. These instructions serve as intermediate representations that bridge the gap between natural language and vulnerability patterns. CodeSpeak processes this analysis by LLMs and creates structured prompt templates with these results, which are used to train a detection model. Compared to deep learning approaches, this framework offers a more intuitive solution while maintaining high detection effectiveness. Extensive experiments conducted on four types of vulnerabilities (Reentrancy, Timestamp, Overflow/Underflow, and Delegatecall) demonstrate the effectiveness of our approach. Our framework also demonstrates strong adaptability to new vulnerability types with minimal training samples, and provides a cost-effective solution for practical deployment. Moreover, a user study with developers shows CodeSpeak reduces detection time by 98.7% compared to manual analysis while maintaining superior accuracy. These improvements highlight the potential of LLM-assisted code analysis in smart contract security assessment.

查看原文本刊更多论文

CodeSpeak：通过llm辅助代码分析改进智能合约漏洞检测

智能合约在区块链技术中发挥着至关重要的作用，但其安全性仍然容易受到各种威胁。虽然深度学习方法在漏洞检测方面表现出了希望，但它们通常需要复杂的图结构，从而使检测过程复杂化。大型语言模型（llm）提供了强大的代码理解能力，但是将它们直接应用于漏洞检测通常会产生不一致或不可靠的结果。为了应对这些挑战，我们引入了CodeSpeak，这是一个通过利用llm辅助代码分析来增强智能合约漏洞检测的新框架。我们的方法首先消除了冗余的代码语句，专注于安全关键部分。然后，我们利用llm设计的特定于领域的指令来模拟安全专家审计实践。这些指令充当中间表示，在自然语言和漏洞模式之间架起桥梁。CodeSpeak通过llm处理此分析，并使用这些结果创建结构化提示模板，用于训练检测模型。与深度学习方法相比，该框架在保持高检测效率的同时提供了更直观的解决方案。针对四种类型的漏洞（Reentrancy、Timestamp、Overflow/Underflow和Delegatecall）进行的大量实验证明了我们的方法的有效性。我们的框架在训练样本最少的情况下对新的漏洞类型表现出较强的适应性，为实际部署提供了一种经济有效的解决方案。此外，与开发人员一起进行的一项用户研究表明，与手动分析相比，CodeSpeak在保持较高准确性的同时减少了98.7%的检测时间。这些改进突出了llm辅助代码分析在智能合约安全评估中的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.