DeepFusion: Smart Contract Vulnerability Detection Via Deep Learning and Data Fusion

IF 5.7 2区计算机科学 Q1 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Reliability Pub Date : 2024-10-29 DOI:10.1109/TR.2024.3480010

Hanting Chu;Pengcheng Zhang;Hai Dong;Yan Xiao;Shunhui Ji

{"title":"DeepFusion: Smart Contract Vulnerability Detection Via Deep Learning and Data Fusion","authors":"Hanting Chu;Pengcheng Zhang;Hai Dong;Yan Xiao;Shunhui Ji","doi":"10.1109/TR.2024.3480010","DOIUrl":null,"url":null,"abstract":"Given that smart contracts execute transactions worth hundreds of millions of dollars daily, the issue of smart contract security has attracted considerable attention over the past few years. Traditional methods for detecting vulnerabilities heavily rely on manually developed rules and features, leading to the problems of low accuracy, high false positives, and poor scalability. Although deep learning-inspired approaches were designed to alleviate the problem, most of them rely on monothetic features, which may result in information incompetence during the learning process. Furthermore, the lack of available labeled vulnerability datasets is also a major limitation. To address these issues, we collect and construct a dataset of five labeled smart contract vulnerabilities, and propose <italic>DeepFusion, a vulnerability detection method that fuses code representation information, including program slice information and abstraction syntax tree (AST) structured information. First, we develop automated tools to extract contract vulnerability slicing information from source code, and extract structured information from source code-converted AST. Second, code features and global structured features are fused into the data. Finally, the fused data are input into the Bidirectional Long Short-Term Memory+ Attention (BiLSTM+ATT) model for smart contract vulnerability detection. The BiLSTM model can capture long-term dependencies in both directions and is more suitable for processing serialized information generated by <italic>DeepFusion, while the attention mechanism can highlight the characteristic information of vulnerabilities. We conducted experiments via collecting a real smart contract dataset. The experimental results show that our method significantly outperforms the existing methods in detecting the vulnerabilities of <italic>reentrancy, <italic>timestamp dependence, <italic>integer overflow and underflow, <italic>Use tx.origin for authentication, and <italic>Unprotected Self-destruct Instruction by 6.36%, 6.42%, 16.5%, 21.29%, and 25.05%, respectively. To the best of our knowledge, the latter two vulnerabilities are the first to be detected using deep learning methods.","PeriodicalId":56305,"journal":{"name":"IEEE Transactions on Reliability","volume":"74 3","pages":"3544-3558"},"PeriodicalIF":5.7000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Reliability","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10737415/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Given that smart contracts execute transactions worth hundreds of millions of dollars daily, the issue of smart contract security has attracted considerable attention over the past few years. Traditional methods for detecting vulnerabilities heavily rely on manually developed rules and features, leading to the problems of low accuracy, high false positives, and poor scalability. Although deep learning-inspired approaches were designed to alleviate the problem, most of them rely on monothetic features, which may result in information incompetence during the learning process. Furthermore, the lack of available labeled vulnerability datasets is also a major limitation. To address these issues, we collect and construct a dataset of five labeled smart contract vulnerabilities, and propose DeepFusion, a vulnerability detection method that fuses code representation information, including program slice information and abstraction syntax tree (AST) structured information. First, we develop automated tools to extract contract vulnerability slicing information from source code, and extract structured information from source code-converted AST. Second, code features and global structured features are fused into the data. Finally, the fused data are input into the Bidirectional Long Short-Term Memory+ Attention (BiLSTM+ATT) model for smart contract vulnerability detection. The BiLSTM model can capture long-term dependencies in both directions and is more suitable for processing serialized information generated by DeepFusion, while the attention mechanism can highlight the characteristic information of vulnerabilities. We conducted experiments via collecting a real smart contract dataset. The experimental results show that our method significantly outperforms the existing methods in detecting the vulnerabilities of reentrancy, timestamp dependence, integer overflow and underflow, Use tx.origin for authentication, and Unprotected Self-destruct Instruction by 6.36%, 6.42%, 16.5%, 21.29%, and 25.05%, respectively. To the best of our knowledge, the latter two vulnerabilities are the first to be detected using deep learning methods.

查看原文本刊更多论文

DeepFusion：通过深度学习和数据融合进行智能合约漏洞检测

鉴于智能合约每天执行价值数亿美元的交易，智能合约安全问题在过去几年中引起了相当大的关注。传统的漏洞检测方法严重依赖于人工开发的规则和特征，存在准确率低、误报率高、可扩展性差的问题。虽然深度学习启发的方法旨在缓解这个问题，但大多数方法依赖于一元特征，这可能导致学习过程中的信息不足。此外，缺乏可用的标记漏洞数据集也是一个主要限制。为了解决这些问题，我们收集并构建了五个标记智能合约漏洞的数据集，并提出了一种融合代码表示信息（包括程序片信息和抽象语法树（AST）结构化信息）的漏洞检测方法DeepFusion。首先，我们开发自动化工具从源代码中提取合同漏洞切片信息，从源代码转换的AST中提取结构化信息，然后将代码特征和全局结构化特征融合到数据中。最后，将融合后的数据输入到双向长短期记忆+注意（BiLSTM+ATT）模型中，用于智能合约漏洞检测。BiLSTM模型可以捕获两个方向的长期依赖关系，更适合处理由DeepFusion生成的序列化信息，而注意机制可以突出漏洞的特征信息。我们通过收集真实的智能合约数据集进行实验。实验结果表明，该方法在可重入漏洞、时间戳依赖漏洞、整数溢出和下溢漏洞、使用txt .origin进行认证漏洞和无保护自毁指令漏洞检测方面分别比现有方法显著提高6.36%、6.42%、16.5%、21.29%和25.05%。据我们所知，后两个漏洞是首先使用深度学习方法检测到的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Reliability 工程技术-工程：电子与电气

CiteScore

12.20

自引率

8.50%

发文量

153

审稿时长

7.5 months

期刊介绍： IEEE Transactions on Reliability is a refereed journal for the reliability and allied disciplines including, but not limited to, maintainability, physics of failure, life testing, prognostics, design and manufacture for reliability, reliability for systems of systems, network availability, mission success, warranty, safety, and various measures of effectiveness. Topics eligible for publication range from hardware to software, from materials to systems, from consumer and industrial devices to manufacturing plants, from individual items to networks, from techniques for making things better to ways of predicting and measuring behavior in the field. As an engineering subject that supports new and existing technologies, we constantly expand into new areas of the assurance sciences.