Towards cost-efficient vulnerability detection with cross-modal adversarial reprogramming

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2025-02-10 DOI:10.1016/j.jss.2025.112365

Zhenzhou Tian , Rui Qiu , Yudong Teng , Jiaze Sun , Yanping Chen , Lingwei Chen

{"title":"Towards cost-efficient vulnerability detection with cross-modal adversarial reprogramming","authors":"Zhenzhou Tian , Rui Qiu , Yudong Teng , Jiaze Sun , Yanping Chen , Lingwei Chen","doi":"10.1016/j.jss.2025.112365","DOIUrl":null,"url":null,"abstract":"<div><div>While deep learning has advanced the automatic detection of software vulnerabilities, current DL-based methods still face two major obstacles: the scarcity of vulnerable code samples and the high computational cost of training models from scratch, which, however, have been largely overlooked. This paper introduces Capture, a novel Cross-modal Adversarial reProgramming approach Towards cost-efficient vUlneRability dEtection, which reduces the need for well-labeled large vulnerable datasets and minimizes training time. Specifically, Capture first performs lexical parsing and linearization on the AST of the source code to extract structure- and type-aware token sequences. These sequences are transformed into a perturbation image by retrieving and reshaping each token’s embedding from a learnable universal perturbation dictionary. This enables a pre-trained model originally designed for image classification to be repurposed to support code vulnerability detection, with a dynamic label remapping scheme applied at the end that reassigns the model’s output to the binary vulnerability detection result. Our experiments demonstrate that Capture achieves detection accuracy comparable to state-of-the-art methods, while enhancing training efficiency due to its minimal quantity of parameters to update during the model training. Notably, Capture excels in scenarios with limited vulnerable samples, delivering superior detection accuracy and F1 scores compared to baseline methods.</div></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"223 ","pages":"Article 112365"},"PeriodicalIF":3.7000,"publicationDate":"2025-02-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121225000330","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

While deep learning has advanced the automatic detection of software vulnerabilities, current DL-based methods still face two major obstacles: the scarcity of vulnerable code samples and the high computational cost of training models from scratch, which, however, have been largely overlooked. This paper introduces Capture, a novel Cross-modal Adversarial reProgramming approach Towards cost-efficient vUlneRability dEtection, which reduces the need for well-labeled large vulnerable datasets and minimizes training time. Specifically, Capture first performs lexical parsing and linearization on the AST of the source code to extract structure- and type-aware token sequences. These sequences are transformed into a perturbation image by retrieving and reshaping each token’s embedding from a learnable universal perturbation dictionary. This enables a pre-trained model originally designed for image classification to be repurposed to support code vulnerability detection, with a dynamic label remapping scheme applied at the end that reassigns the model’s output to the binary vulnerability detection result. Our experiments demonstrate that Capture achieves detection accuracy comparable to state-of-the-art methods, while enhancing training efficiency due to its minimal quantity of parameters to update during the model training. Notably, Capture excels in scenarios with limited vulnerable samples, delivering superior detection accuracy and F1 scores compared to baseline methods.

查看原文本刊更多论文

利用跨模态对抗性重编程实现低成本的漏洞检测

虽然深度学习推动了软件漏洞的自动检测，但目前基于dl的方法仍然面临两个主要障碍：易受攻击的代码样本的稀缺性和从头开始训练模型的高计算成本，然而这在很大程度上被忽视了。本文介绍了一种新颖的跨模态对抗性重编程方法Capture，该方法用于经济高效的漏洞检测，减少了对标记良好的大型漏洞数据集的需求，并最大限度地减少了训练时间。具体来说，Capture首先对源代码的AST执行词法解析和线性化，以提取结构和类型感知的令牌序列。通过从可学习的通用扰动字典中检索和重塑每个标记的嵌入，将这些序列转换成扰动图像。这使得原本为图像分类设计的预训练模型可以被重新用于支持代码漏洞检测，最后应用动态标签重映射方案，将模型的输出重新分配给二进制漏洞检测结果。我们的实验表明，Capture达到了与最先进的方法相当的检测精度，同时由于在模型训练过程中需要更新的参数数量最少，因此提高了训练效率。值得注意的是，Capture在脆弱样本有限的情况下表现出色，与基线方法相比，提供了更高的检测精度和F1分数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.