SCsVulSegLytix: Detecting and extracting vulnerable segments from smart contracts using weakly-supervised learning

IF 4.1 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-07-17 DOI:10.1016/j.jss.2025.112532

Borna Ahmadzadeh , Arousha Haghighian Roudsari , Sepideh HajiHosseinKhani , Arash Habibi Lashkari

{"title":"SCsVulSegLytix: Detecting and extracting vulnerable segments from smart contracts using weakly-supervised learning","authors":"Borna Ahmadzadeh , Arousha Haghighian Roudsari , Sepideh HajiHosseinKhani , Arash Habibi Lashkari","doi":"10.1016/j.jss.2025.112532","DOIUrl":null,"url":null,"abstract":"<div><div>Smart contracts (SCs), self-executing digital contracts deployed on blockchain networks, are becoming increasingly more prevalent in various sectors, such as finance, thanks to their automation, transparency, and cost efficiency. Given the substantial size of assets managed by them, SCs have become attractive targets for hackers, who exploit vulnerabilities in them to steal funds. Blockchain’s inherent immutability means vulnerabilities cannot be fixed quickly, and the immaturity of the Solidity programming language, which introduces potential security threats to SCs, exacerbates this problem. As such, there is a pressing need to develop security measures to identify vulnerabilities in SCs. Non-learning-based detection methods utilizing heuristics designed by experts often cannot handle the evolving complexity of SC vulnerabilities. In contrast, though typically outperforming non-learning-based solutions, learning-based solutions generally do not pinpoint the locations of vulnerabilities in SCs. Learning-based approaches that identify the locations of vulnerabilities come with several challenges: First, they convert SCs into graphs, incurring computational overhead and making the learning system more complex. Second, most require line- or function-level labels to be trained, which are difficult to gather. Lastly, their coverage of vulnerability types is not extensive, exposing the user to vulnerabilities not covered by them. This work presents SCsVulSegLytix, a learning-based approach for detecting and extracting vulnerable segments in SCs. SCsVulSegLytix uses a source code-based Transformer model trained with contract-level labels to classify entire contracts as vulnerable, followed by a post-hoc interpretability method to extract vulnerable segments in SCs according to relevance scores. Unlike previous extraction models, SCsVulSegLytix requires no line-level annotations and can be trained using contract-wide labels only, which are much easier to collect. Moreover, it operates directly on Solidity source code, substantially improving efficiency compared to expensive graph-based models. Finally, it extends support to several important classes of SC vulnerabilities, meaning developers are protected against various potential attacks. Experiments show that our model outperforms existing models concerning both contract- and line-level vulnerability identification while achieving greater computation efficiency.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"230 ","pages":"Article 112532"},"PeriodicalIF":4.1000,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225002006","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Smart contracts (SCs), self-executing digital contracts deployed on blockchain networks, are becoming increasingly more prevalent in various sectors, such as finance, thanks to their automation, transparency, and cost efficiency. Given the substantial size of assets managed by them, SCs have become attractive targets for hackers, who exploit vulnerabilities in them to steal funds. Blockchain’s inherent immutability means vulnerabilities cannot be fixed quickly, and the immaturity of the Solidity programming language, which introduces potential security threats to SCs, exacerbates this problem. As such, there is a pressing need to develop security measures to identify vulnerabilities in SCs. Non-learning-based detection methods utilizing heuristics designed by experts often cannot handle the evolving complexity of SC vulnerabilities. In contrast, though typically outperforming non-learning-based solutions, learning-based solutions generally do not pinpoint the locations of vulnerabilities in SCs. Learning-based approaches that identify the locations of vulnerabilities come with several challenges: First, they convert SCs into graphs, incurring computational overhead and making the learning system more complex. Second, most require line- or function-level labels to be trained, which are difficult to gather. Lastly, their coverage of vulnerability types is not extensive, exposing the user to vulnerabilities not covered by them. This work presents SCsVulSegLytix, a learning-based approach for detecting and extracting vulnerable segments in SCs. SCsVulSegLytix uses a source code-based Transformer model trained with contract-level labels to classify entire contracts as vulnerable, followed by a post-hoc interpretability method to extract vulnerable segments in SCs according to relevance scores. Unlike previous extraction models, SCsVulSegLytix requires no line-level annotations and can be trained using contract-wide labels only, which are much easier to collect. Moreover, it operates directly on Solidity source code, substantially improving efficiency compared to expensive graph-based models. Finally, it extends support to several important classes of SC vulnerabilities, meaning developers are protected against various potential attacks. Experiments show that our model outperforms existing models concerning both contract- and line-level vulnerability identification while achieving greater computation efficiency.

查看原文本刊更多论文

SCsVulSegLytix：使用弱监督学习从智能合约中检测和提取易受攻击的部分

智能合约（SCs）是部署在区块链网络上的自动执行数字合约，由于其自动化、透明度和成本效率，在金融等各个领域变得越来越普遍。鉴于这些公司管理的资产规模庞大，它们已成为黑客的诱人目标，黑客利用这些公司的漏洞窃取资金。区块链固有的不变性意味着漏洞不能快速修复，而Solidity编程语言的不成熟，给sc带来了潜在的安全威胁，加剧了这个问题。因此，迫切需要制定安全措施来识别SCs中的漏洞。利用专家设计的启发式算法的非基于学习的检测方法往往无法处理SC漏洞不断变化的复杂性。相比之下，尽管基于学习的解决方案通常优于非基于学习的解决方案，但基于学习的解决方案通常无法精确定位SCs中的漏洞位置。识别漏洞位置的基于学习的方法面临几个挑战：首先，它们将sc转换为图形，这会产生计算开销，并使学习系统更加复杂。其次，大多数需要训练行级或函数级标签，这很难收集。最后，它们对漏洞类型的覆盖范围不广，使用户暴露在它们未涵盖的漏洞中。这项工作提出了SCsVulSegLytix，一种基于学习的方法，用于检测和提取SCs中的脆弱片段。SCsVulSegLytix使用一个基于源代码的Transformer模型，该模型经过合约级别标签的训练，将整个合约分类为易受攻击的，然后使用一种事后可解释性方法，根据相关分数提取sc中的易受攻击部分。与以前的提取模型不同，SCsVulSegLytix不需要行级注释，并且可以仅使用契约范围的标签进行训练，这更容易收集。此外，它直接在Solidity源代码上运行，与昂贵的基于图的模型相比，大大提高了效率。最后，它扩展了对几个重要的SC漏洞类的支持，这意味着开发人员可以免受各种潜在的攻击。实验表明，我们的模型在契约级和行级漏洞识别方面都优于现有的模型，同时获得了更高的计算效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.