DAppSCAN:为 DApp 项目中的智能合约弱点构建大规模数据集

IF 6.5 1区 计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Zibin Zheng;Jianzhong Su;Jiachi Chen;David Lo;Zhijie Zhong;Mingxi Ye
{"title":"DAppSCAN:为 DApp 项目中的智能合约弱点构建大规模数据集","authors":"Zibin Zheng;Jianzhong Su;Jiachi Chen;David Lo;Zhijie Zhong;Mingxi Ye","doi":"10.1109/TSE.2024.3383422","DOIUrl":null,"url":null,"abstract":"The Smart Contract Weakness Classification Registry (SWC Registry) is a widely recognized list of smart contract weaknesses specific to the Ethereum platform. Despite the SWC Registry not being updated with new entries since 2020, the sustained development of smart contract analysis tools for detecting SWC-listed weaknesses highlights their ongoing significance in the field. However, evaluating these tools has proven challenging due to the absence of a large, unbiased, real-world dataset. To address this problem, we aim to build a large-scale SWC weakness dataset from real-world DApp projects. We recruited 22 participants and spent 44 person-months analyzing 1,199 open-source audit reports from 29 security teams. In total, we identified 9,154 weaknesses and developed two distinct datasets, i.e., \n<sc>DAppSCAN-Source</small>\n and \n<sc>DAppSCAN-Bytecode</small>\n. The \n<sc>DAppSCAN-Source</small>\n dataset comprises 39,904 Solidity files, featuring 1,618 SWC weaknesses sourced from 682 real-world DApp projects. However, the Solidity files in this dataset may not be directly compilable for further analysis. To facilitate automated analysis, we developed a tool capable of automatically identifying dependency relationships within DApp projects and completing missing public libraries. Using this tool, we created \n<sc>DAppSCAN-Bytecode</small>\n dataset, which consists of 6,665 compiled smart contract with 888 SWC weaknesses. Based on \n<sc>DAppSCAN-Bytecode</small>\n, we conducted an empirical study to evaluate the performance of state-of-the-art smart contract weakness detection tools. The evaluation results revealed sub-par performance for these tools in terms of both effectiveness and success detection rate, indicating that future development should prioritize real-world datasets over simplistic toy contracts.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":null,"pages":null},"PeriodicalIF":6.5000,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"DAppSCAN: Building Large-Scale Datasets for Smart Contract Weaknesses in DApp Projects\",\"authors\":\"Zibin Zheng;Jianzhong Su;Jiachi Chen;David Lo;Zhijie Zhong;Mingxi Ye\",\"doi\":\"10.1109/TSE.2024.3383422\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Smart Contract Weakness Classification Registry (SWC Registry) is a widely recognized list of smart contract weaknesses specific to the Ethereum platform. Despite the SWC Registry not being updated with new entries since 2020, the sustained development of smart contract analysis tools for detecting SWC-listed weaknesses highlights their ongoing significance in the field. However, evaluating these tools has proven challenging due to the absence of a large, unbiased, real-world dataset. To address this problem, we aim to build a large-scale SWC weakness dataset from real-world DApp projects. We recruited 22 participants and spent 44 person-months analyzing 1,199 open-source audit reports from 29 security teams. In total, we identified 9,154 weaknesses and developed two distinct datasets, i.e., \\n<sc>DAppSCAN-Source</small>\\n and \\n<sc>DAppSCAN-Bytecode</small>\\n. The \\n<sc>DAppSCAN-Source</small>\\n dataset comprises 39,904 Solidity files, featuring 1,618 SWC weaknesses sourced from 682 real-world DApp projects. However, the Solidity files in this dataset may not be directly compilable for further analysis. To facilitate automated analysis, we developed a tool capable of automatically identifying dependency relationships within DApp projects and completing missing public libraries. Using this tool, we created \\n<sc>DAppSCAN-Bytecode</small>\\n dataset, which consists of 6,665 compiled smart contract with 888 SWC weaknesses. Based on \\n<sc>DAppSCAN-Bytecode</small>\\n, we conducted an empirical study to evaluate the performance of state-of-the-art smart contract weakness detection tools. The evaluation results revealed sub-par performance for these tools in terms of both effectiveness and success detection rate, indicating that future development should prioritize real-world datasets over simplistic toy contracts.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":6.5000,\"publicationDate\":\"2024-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10486822/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10486822/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

智能合约弱点分类注册表(SWC Registry)是一份广受认可的以太坊平台专用智能合约弱点列表。尽管 SWC 注册表自 2020 年以来就没有更新过新的条目,但用于检测 SWC 所列弱点的智能合约分析工具的持续开发,凸显了其在该领域的持续重要性。然而,由于缺乏大型、无偏见的真实数据集,对这些工具进行评估已被证明具有挑战性。为了解决这个问题,我们旨在从真实世界的 DApp 项目中建立一个大规模的 SWC 弱点数据集。我们招募了 22 名参与者,花费 44 个人月的时间分析了来自 29 个安全团队的 1,199 份开源审计报告。我们总共发现了 9154 个弱点,并开发了两个不同的数据集,即 DAppSCAN-Source 和 DAppSCAN-Bytecode。DAppSCAN-Source 数据集包含 39,904 个 Solidity 文件,其中 1,618 个 SWC 弱点来自 682 个真实 DApp 项目。不过,该数据集中的 Solidity 文件可能无法直接编译以进行进一步分析。为了便于自动分析,我们开发了一种工具,能够自动识别 DApp 项目中的依赖关系,并补全缺失的公共库。利用该工具,我们创建了 DAppSCAN-Bytecode 数据集,其中包括 6665 个已编译的智能合约和 888 个 SWC 弱点。基于 DAppSCAN-Bytecode,我们开展了一项实证研究,以评估最先进的智能合约弱点检测工具的性能。评估结果表明,这些工具在有效性和成功检测率方面的表现都不尽如人意,这表明未来的开发工作应优先考虑真实世界的数据集,而不是简单的玩具合约。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
DAppSCAN: Building Large-Scale Datasets for Smart Contract Weaknesses in DApp Projects
The Smart Contract Weakness Classification Registry (SWC Registry) is a widely recognized list of smart contract weaknesses specific to the Ethereum platform. Despite the SWC Registry not being updated with new entries since 2020, the sustained development of smart contract analysis tools for detecting SWC-listed weaknesses highlights their ongoing significance in the field. However, evaluating these tools has proven challenging due to the absence of a large, unbiased, real-world dataset. To address this problem, we aim to build a large-scale SWC weakness dataset from real-world DApp projects. We recruited 22 participants and spent 44 person-months analyzing 1,199 open-source audit reports from 29 security teams. In total, we identified 9,154 weaknesses and developed two distinct datasets, i.e., DAppSCAN-Source and DAppSCAN-Bytecode . The DAppSCAN-Source dataset comprises 39,904 Solidity files, featuring 1,618 SWC weaknesses sourced from 682 real-world DApp projects. However, the Solidity files in this dataset may not be directly compilable for further analysis. To facilitate automated analysis, we developed a tool capable of automatically identifying dependency relationships within DApp projects and completing missing public libraries. Using this tool, we created DAppSCAN-Bytecode dataset, which consists of 6,665 compiled smart contract with 888 SWC weaknesses. Based on DAppSCAN-Bytecode , we conducted an empirical study to evaluate the performance of state-of-the-art smart contract weakness detection tools. The evaluation results revealed sub-par performance for these tools in terms of both effectiveness and success detection rate, indicating that future development should prioritize real-world datasets over simplistic toy contracts.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Software Engineering
IEEE Transactions on Software Engineering 工程技术-工程:电子与电气
CiteScore
9.70
自引率
10.80%
发文量
724
审稿时长
6 months
期刊介绍: IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include: a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models. b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects. c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards. d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues. e) System issues: Hardware-software trade-offs. f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信