利用后门触发器探索语义冗余：基于 DNN 的软件漏洞检测所面临挑战的补充见解

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

ACM Transactions on Software Engineering and Methodology Pub Date : 2024-01-24 DOI:10.1145/3640333

Changjie Shao, Gaolei Li, Jun Wu, Xi Zheng

{"title":"利用后门触发器探索语义冗余：基于 DNN 的软件漏洞检测所面临挑战的补充见解","authors":"Changjie Shao, Gaolei Li, Jun Wu, Xi Zheng","doi":"10.1145/3640333","DOIUrl":null,"url":null,"abstract":"<p>To detect software vulnerabilities with better performance, deep neural networks (DNNs) have received extensive attention recently. However, these vulnerability detection DNN models trained with code representations are vulnerable to specific perturbations on code representations. This motivates us to rethink the bane of software vulnerability detection and find function-agnostic features during code representation which we name as semantic redundant features. This paper first identifies a tight correlation between function-agnostic triggers and semantic redundant feature space (where the redundant features reside) in these DNN models. For correlation identification, we propose a novel Backdoor-based Semantic Redundancy Exploration (BSemRE) framework. In BSemRE, the sensitivity of the trained models to function-agnostic triggers is observed to verify the existence of semantic redundancy in various code representations. Specifically, acting as the typical manifestations of semantic redundancy, naming conventions, ternary operators and identically-true conditions are exploited to generate function-agnostic triggers. Extensive comparative experiments on 1613823 samples of 8 representative vulnerability datasets and state-of-the-art code representation techniques and vulnerability detection models demonstrate that the existence of semantic redundancy determines the upper trustworthiness limit of DNN-based software vulnerability detection. To the best of our knowledge, this is the first work exploring the bane of software vulnerability detection using backdoor triggers.</p>","PeriodicalId":50933,"journal":{"name":"ACM Transactions on Software Engineering and Methodology","volume":"5 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2024-01-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges facing DNN-based Software Vulnerability Detection\",\"authors\":\"Changjie Shao, Gaolei Li, Jun Wu, Xi Zheng\",\"doi\":\"10.1145/3640333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>To detect software vulnerabilities with better performance, deep neural networks (DNNs) have received extensive attention recently. However, these vulnerability detection DNN models trained with code representations are vulnerable to specific perturbations on code representations. This motivates us to rethink the bane of software vulnerability detection and find function-agnostic features during code representation which we name as semantic redundant features. This paper first identifies a tight correlation between function-agnostic triggers and semantic redundant feature space (where the redundant features reside) in these DNN models. For correlation identification, we propose a novel Backdoor-based Semantic Redundancy Exploration (BSemRE) framework. In BSemRE, the sensitivity of the trained models to function-agnostic triggers is observed to verify the existence of semantic redundancy in various code representations. Specifically, acting as the typical manifestations of semantic redundancy, naming conventions, ternary operators and identically-true conditions are exploited to generate function-agnostic triggers. Extensive comparative experiments on 1613823 samples of 8 representative vulnerability datasets and state-of-the-art code representation techniques and vulnerability detection models demonstrate that the existence of semantic redundancy determines the upper trustworthiness limit of DNN-based software vulnerability detection. To the best of our knowledge, this is the first work exploring the bane of software vulnerability detection using backdoor triggers.</p>\",\"PeriodicalId\":50933,\"journal\":{\"name\":\"ACM Transactions on Software Engineering and Methodology\",\"volume\":\"5 1\",\"pages\":\"\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2024-01-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Software Engineering and Methodology\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3640333\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Software Engineering and Methodology","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3640333","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

为了以更好的性能检测软件漏洞，深度神经网络（DNN）近来受到广泛关注。然而，这些用代码表示法训练的漏洞检测 DNN 模型很容易受到代码表示法特定扰动的影响。这促使我们重新思考软件漏洞检测的祸根，并在代码表示过程中找到与功能无关的特征，我们将其命名为语义冗余特征。本文首先确定了这些 DNN 模型中功能无关触发器与语义冗余特征空间（冗余特征所在）之间的紧密相关性。为了识别相关性，我们提出了一种新颖的基于后门的语义冗余探索（BSemRE）框架。在 BSemRE 中，我们观察了训练有素的模型对与功能无关的触发器的敏感性，以验证各种代码表示中是否存在语义冗余。具体来说，作为语义冗余的典型表现形式，命名约定、三元运算符和同真条件被用来生成功能无关触发器。对 8 个代表性漏洞数据集的 1613823 个样本以及最先进的代码表示技术和漏洞检测模型进行的广泛对比实验表明，语义冗余的存在决定了基于 DNN 的软件漏洞检测的可信度上限。据我们所知，这是第一项探索利用后门触发器进行软件漏洞检测的难题的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring Semantic Redundancy using Backdoor Triggers: A Complementary Insight into the Challenges facing DNN-based Software Vulnerability Detection

To detect software vulnerabilities with better performance, deep neural networks (DNNs) have received extensive attention recently. However, these vulnerability detection DNN models trained with code representations are vulnerable to specific perturbations on code representations. This motivates us to rethink the bane of software vulnerability detection and find function-agnostic features during code representation which we name as semantic redundant features. This paper first identifies a tight correlation between function-agnostic triggers and semantic redundant feature space (where the redundant features reside) in these DNN models. For correlation identification, we propose a novel Backdoor-based Semantic Redundancy Exploration (BSemRE) framework. In BSemRE, the sensitivity of the trained models to function-agnostic triggers is observed to verify the existence of semantic redundancy in various code representations. Specifically, acting as the typical manifestations of semantic redundancy, naming conventions, ternary operators and identically-true conditions are exploited to generate function-agnostic triggers. Extensive comparative experiments on 1613823 samples of 8 representative vulnerability datasets and state-of-the-art code representation techniques and vulnerability detection models demonstrate that the existence of semantic redundancy determines the upper trustworthiness limit of DNN-based software vulnerability detection. To the best of our knowledge, this is the first work exploring the bane of software vulnerability detection using backdoor triggers.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Software Engineering and Methodology 工程技术-计算机：软件工程

CiteScore

6.30

自引率

4.50%

发文量

164

审稿时长

>12 weeks

期刊介绍： Designing and building a large, complex software system is a tremendous challenge. ACM Transactions on Software Engineering and Methodology (TOSEM) publishes papers on all aspects of that challenge: specification, design, development and maintenance. It covers tools and methodologies, languages, data structures, and algorithms. TOSEM also reports on successful efforts, noting practical lessons that can be scaled and transferred to other projects, and often looks at applications of innovative technologies. The tone is scholarly but readable; the content is worthy of study; the presentation is effective.