块图:挖掘知识特征，有效地检测智能合约漏洞

IF 7.5 2区计算机科学 Q1 TELECOMMUNICATIONS

Digital Communications and Networks Pub Date : 2025-02-01 DOI:10.1016/j.dcan.2023.07.009

Xueshuo Xie , Haolong Wang , Zhaolong Jian , Yaozheng Fang , Zichun Wang , Tao Li

{"title":"块图:挖掘知识特征，有效地检测智能合约漏洞","authors":"Xueshuo Xie , Haolong Wang , Zhaolong Jian , Yaozheng Fang , Zichun Wang , Tao Li","doi":"10.1016/j.dcan.2023.07.009","DOIUrl":null,"url":null,"abstract":"<div><div>Smart contracts are widely used on the blockchain to implement complex transactions, such as decentralized applications on Ethereum. Effective vulnerability detection of large-scale smart contracts is critical, as attacks on smart contracts often cause huge economic losses. Since it is difficult to repair and update smart contracts, it is necessary to find the vulnerabilities before they are deployed. However, code analysis, which requires traversal paths, and learning methods, which require many features to be trained, are too time-consuming to detect large-scale on-chain contracts. Learning-based methods will obtain detection models from a feature space compared to code analysis methods such as symbol execution. But the existing features lack the interpretability of the detection results and training model, even worse, the large-scale feature space also affects the efficiency of detection. This paper focuses on improving the detection efficiency by reducing the dimension of the features, combined with expert knowledge. In this paper, a feature extraction model <em>Block-gram</em> is proposed to form low-dimensional knowledge-based features from bytecode. First, the metadata is separated and the runtime code is converted into a sequence of opcodes, which are divided into segments based on some instructions (<em>jumps</em>, etc.). Then, scalable <em>Block-gram</em> features, including 4-dimensional block features and 8-dimensional attribute features, are mined for the learning-based model training. Finally, feature contributions are calculated from <em>SHAP</em> values to measure the relationship between our features and the results of the detection model. In addition, six types of vulnerability labels are made on a dataset containing <span><math><mn>33</mn><mo>,</mo><mn>885</mn></math></span> contracts, and these knowledge-based features are evaluated using seven state-of-the-art learning algorithms, which show that the average detection latency speeds up 25× to 650×, compared with the features extracted by <em>N-gram</em>, and also can enhance the interpretability of the detection model.</div></div>","PeriodicalId":48631,"journal":{"name":"Digital Communications and Networks","volume":"11 1","pages":"Pages 1-12"},"PeriodicalIF":7.5000,"publicationDate":"2025-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Block-gram: Mining knowledgeable features for efficiently smart contract vulnerability detection\",\"authors\":\"Xueshuo Xie , Haolong Wang , Zhaolong Jian , Yaozheng Fang , Zichun Wang , Tao Li\",\"doi\":\"10.1016/j.dcan.2023.07.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Smart contracts are widely used on the blockchain to implement complex transactions, such as decentralized applications on Ethereum. Effective vulnerability detection of large-scale smart contracts is critical, as attacks on smart contracts often cause huge economic losses. Since it is difficult to repair and update smart contracts, it is necessary to find the vulnerabilities before they are deployed. However, code analysis, which requires traversal paths, and learning methods, which require many features to be trained, are too time-consuming to detect large-scale on-chain contracts. Learning-based methods will obtain detection models from a feature space compared to code analysis methods such as symbol execution. But the existing features lack the interpretability of the detection results and training model, even worse, the large-scale feature space also affects the efficiency of detection. This paper focuses on improving the detection efficiency by reducing the dimension of the features, combined with expert knowledge. In this paper, a feature extraction model <em>Block-gram</em> is proposed to form low-dimensional knowledge-based features from bytecode. First, the metadata is separated and the runtime code is converted into a sequence of opcodes, which are divided into segments based on some instructions (<em>jumps</em>, etc.). Then, scalable <em>Block-gram</em> features, including 4-dimensional block features and 8-dimensional attribute features, are mined for the learning-based model training. Finally, feature contributions are calculated from <em>SHAP</em> values to measure the relationship between our features and the results of the detection model. In addition, six types of vulnerability labels are made on a dataset containing <span><math><mn>33</mn><mo>,</mo><mn>885</mn></math></span> contracts, and these knowledge-based features are evaluated using seven state-of-the-art learning algorithms, which show that the average detection latency speeds up 25× to 650×, compared with the features extracted by <em>N-gram</em>, and also can enhance the interpretability of the detection model.</div></div>\",\"PeriodicalId\":48631,\"journal\":{\"name\":\"Digital Communications and Networks\",\"volume\":\"11 1\",\"pages\":\"Pages 1-12\"},\"PeriodicalIF\":7.5000,\"publicationDate\":\"2025-02-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Communications and Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2352864823001347\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"TELECOMMUNICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Communications and Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2352864823001347","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TELECOMMUNICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

智能合约在区块链上被广泛用于实现复杂的交易，例如以太坊上的分散应用程序。大规模智能合约的有效漏洞检测至关重要，因为对智能合约的攻击往往会造成巨大的经济损失。由于智能合约难以修复和更新，因此有必要在部署之前找到漏洞。然而，需要遍历路径的代码分析和需要训练许多特征的学习方法对于检测大规模链上合约来说过于耗时。与符号执行等代码分析方法相比，基于学习的方法将从特征空间获得检测模型。但是现有的特征缺乏检测结果和训练模型的可解释性，更糟糕的是，大规模的特征空间也影响了检测的效率。本文的重点是结合专家知识，通过降维特征来提高检测效率。本文提出了一种特征提取模型Block-gram，从字节码中提取低维知识特征。首先，将元数据分离，并将运行时代码转换为一系列操作码，这些操作码根据一些指令（跳转等）划分为段。然后，挖掘可扩展的块图特征，包括4维块特征和8维属性特征，用于基于学习的模型训练。最后，从SHAP值计算特征贡献，以衡量我们的特征与检测模型结果之间的关系。此外，在包含33,885份合同的数据集上制作了6种类型的漏洞标签，并使用7种最先进的学习算法对这些基于知识的特征进行了评估，结果表明，与N-gram提取的特征相比，平均检测延迟提高了25 ~ 650倍，并且增强了检测模型的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Block-gram: Mining knowledgeable features for efficiently smart contract vulnerability detection

Smart contracts are widely used on the blockchain to implement complex transactions, such as decentralized applications on Ethereum. Effective vulnerability detection of large-scale smart contracts is critical, as attacks on smart contracts often cause huge economic losses. Since it is difficult to repair and update smart contracts, it is necessary to find the vulnerabilities before they are deployed. However, code analysis, which requires traversal paths, and learning methods, which require many features to be trained, are too time-consuming to detect large-scale on-chain contracts. Learning-based methods will obtain detection models from a feature space compared to code analysis methods such as symbol execution. But the existing features lack the interpretability of the detection results and training model, even worse, the large-scale feature space also affects the efficiency of detection. This paper focuses on improving the detection efficiency by reducing the dimension of the features, combined with expert knowledge. In this paper, a feature extraction model Block-gram is proposed to form low-dimensional knowledge-based features from bytecode. First, the metadata is separated and the runtime code is converted into a sequence of opcodes, which are divided into segments based on some instructions (jumps, etc.). Then, scalable Block-gram features, including 4-dimensional block features and 8-dimensional attribute features, are mined for the learning-based model training. Finally, feature contributions are calculated from SHAP values to measure the relationship between our features and the results of the detection model. In addition, six types of vulnerability labels are made on a dataset containing

33, 885

contracts, and these knowledge-based features are evaluated using seven state-of-the-art learning algorithms, which show that the average detection latency speeds up 25× to 650×, compared with the features extracted by N-gram, and also can enhance the interpretability of the detection model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Digital Communications and Networks Computer Science-Hardware and Architecture

CiteScore

12.80

自引率

5.10%

发文量

915

审稿时长

30 weeks

期刊介绍： Digital Communications and Networks is a prestigious journal that emphasizes on communication systems and networks. We publish only top-notch original articles and authoritative reviews, which undergo rigorous peer-review. We are proud to announce that all our articles are fully Open Access and can be accessed on ScienceDirect. Our journal is recognized and indexed by eminent databases such as the Science Citation Index Expanded (SCIE) and Scopus. In addition to regular articles, we may also consider exceptional conference papers that have been significantly expanded. Furthermore, we periodically release special issues that focus on specific aspects of the field. In conclusion, Digital Communications and Networks is a leading journal that guarantees exceptional quality and accessibility for researchers and scholars in the field of communication systems and networks.