Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability Detection

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-05-09 DOI:10.1145/3664602

Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, Dinh Phung

{"title":"Deep Domain Adaptation With Max-Margin Principle for Cross-Project Imbalanced Software Vulnerability Detection","authors":"Van Nguyen, Trung Le, Chakkrit Tantithamthavorn, John Grundy, Dinh Phung","doi":"10.1145/3664602","DOIUrl":null,"url":null,"abstract":"<p>Software vulnerabilities (SVs) have become a common, serious, and crucial concern due to the ubiquity of computer software. Many AI-based approaches have been proposed to solve the software vulnerability detection (SVD) problem to ensure the security and integrity of software applications (in both the development and testing phases). However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerability datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for SVD. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of SVs from imbalanced labeled into imbalanced unlabeled projects. <i>Our approach is the first work that leverages solid body theories of the max-margin principle, kernel methods, and bridging the gap between source and target domains for imbalanced domain adaptation (DA) applied in cross-project SVD</i>. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, one of the most important measures in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"69 3 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3664602","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Software vulnerabilities (SVs) have become a common, serious, and crucial concern due to the ubiquity of computer software. Many AI-based approaches have been proposed to solve the software vulnerability detection (SVD) problem to ensure the security and integrity of software applications (in both the development and testing phases). However, there are still two open and significant issues for SVD in terms of i) learning automatic representations to improve the predictive performance of SVD, and ii) tackling the scarcity of labeled vulnerability datasets that conventionally need laborious labeling effort by experts. In this paper, we propose a novel approach to tackle these two crucial issues. We first exploit the automatic representation learning with deep domain adaptation for SVD. We then propose a novel cross-domain kernel classifier leveraging the max-margin principle to significantly improve the transfer learning process of SVs from imbalanced labeled into imbalanced unlabeled projects. Our approach is the first work that leverages solid body theories of the max-margin principle, kernel methods, and bridging the gap between source and target domains for imbalanced domain adaptation (DA) applied in cross-project SVD. The experimental results on real-world software datasets show the superiority of our proposed method over state-of-the-art baselines. In short, our method obtains a higher performance on F1-measure, one of the most important measures in SVD, from 1.83% to 6.25% compared to the second highest method in the used datasets.

查看原文本刊更多论文

利用最大边际原则进行跨项目不平衡软件漏洞检测的深度域自适应技术

由于计算机软件无处不在，软件漏洞（SVs）已成为一个普遍、严重和至关重要的问题。人们提出了许多基于人工智能的方法来解决软件漏洞检测（SVD）问题，以确保软件应用程序（在开发和测试阶段）的安全性和完整性。然而，软件漏洞检测仍有两个重要问题有待解决：一是学习自动表征以提高软件漏洞检测的预测性能；二是解决标注漏洞数据集稀缺的问题，传统上需要专家费力地进行标注。在本文中，我们提出了一种新方法来解决这两个关键问题。我们首先利用 SVD 的深度域适应自动表示学习。然后，我们提出了一种利用最大边际原则的新型跨域内核分类器，以显著改善 SV 从不平衡性标注项目到不平衡性非标注项目的迁移学习过程。我们的方法是利用最大边际原理、内核方法以及缩小源域和目标域之间的差距等坚实理论来实现跨项目 SVD 中不平衡域适应（DA）的第一项工作。在实际软件数据集上的实验结果表明，我们提出的方法优于最先进的基线方法。简而言之，与所使用数据集中排名第二的方法相比，我们的方法在 F1 测量（SVD 中最重要的测量指标之一）上获得了更高的性能，从 1.83% 提高到 6.25%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Software Engineering and Methodology 工程技术-计算机：软件工程

CiteScore

6.30

自引率

4.50%

发文量

164

审稿时长

>12 weeks

期刊介绍： Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.