On the effects of program slicing for vulnerability detection during code inspection.

IF 3.5 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Empirical Software Engineering Pub Date : 2025-01-01 Epub Date: 2025-04-05 DOI:10.1007/s10664-025-10636-y

Aurora Papotti, Katja Tuma, Fabio Massacci

{"title":"On the effects of program slicing for vulnerability detection during code inspection.","authors":"Aurora Papotti, Katja Tuma, Fabio Massacci","doi":"10.1007/s10664-025-10636-y","DOIUrl":null,"url":null,"abstract":"<p><p>Slicing is a fault localization technique that has been proposed to support debugging and program comprehension. Yet, its empirical effectiveness during code inspection by humans has received limited attention. The goal of our study is two-fold. First, we aim to define what it means for a code reviewer to identify the vulnerable lines correctly. Second, we investigate whether reducing the number of to-be-inspected lines by method-level slicing supports code reviewers in detecting security vulnerabilities. We propose a novel approach based on the notion of a <math><mi>δ</mi></math> -neighborhood (intuitively based on the idea of the context size of the command git diff) to define correctly identified lines. Then, we conducted a multi-year controlled experiment (2017-2023) in which MSc students attending security courses ( <math><mrow><mi>n</mi> <mo>=</mo> <mn>236</mn></mrow> </math> ) were tasked with identifying vulnerable lines in original or sliced Java files from Apache Tomcat. We provide perfect seed lines for a slicing algorithm to control for confounding factors. Each treatment differs in the pair (Vulnerability, Original/Sliced) with a balanced design with vulnerabilities from the OWASP Top 10 2017: A1 (Injection), A5 (Broken Access Control), A6 (Security Misconfiguration), and A7 (Cross-Site Scripting). To generate smaller slices for human consumption, we used a variant of intra-procedural thin slicing. We report the results for <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> which corresponds to exactly matching the vulnerable ground truth lines, and <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> which represents the scenario of identifying the vulnerable area. For both cases, we found that slicing helps in 'finding something' (the participant has found at least some vulnerable lines) as opposed to 'finding nothing'. For the case of <math><mrow><mi>δ</mi> <mo>=</mo> <mn>0</mn></mrow> </math> analyzing a slice and analyzing the original file are statistically equivalent from the perspective of lines found by those who found something. With <math><mrow><mi>δ</mi> <mo>=</mo> <mn>3</mn></mrow> </math> slicing helps to find more vulnerabilities compared to analyzing an original file, as we would normally expect. Given the type of population, additional experiments are necessary to be generalized to experienced developers.</p>","PeriodicalId":11525,"journal":{"name":"Empirical Software Engineering","volume":"30 3","pages":"93"},"PeriodicalIF":3.5000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11972194/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Empirical Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10664-025-10636-y","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/4/5 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Slicing is a fault localization technique that has been proposed to support debugging and program comprehension. Yet, its empirical effectiveness during code inspection by humans has received limited attention. The goal of our study is two-fold. First, we aim to define what it means for a code reviewer to identify the vulnerable lines correctly. Second, we investigate whether reducing the number of to-be-inspected lines by method-level slicing supports code reviewers in detecting security vulnerabilities. We propose a novel approach based on the notion of a $δ$ -neighborhood (intuitively based on the idea of the context size of the command git diff) to define correctly identified lines. Then, we conducted a multi-year controlled experiment (2017-2023) in which MSc students attending security courses ( $n = 236$ ) were tasked with identifying vulnerable lines in original or sliced Java files from Apache Tomcat. We provide perfect seed lines for a slicing algorithm to control for confounding factors. Each treatment differs in the pair (Vulnerability, Original/Sliced) with a balanced design with vulnerabilities from the OWASP Top 10 2017: A1 (Injection), A5 (Broken Access Control), A6 (Security Misconfiguration), and A7 (Cross-Site Scripting). To generate smaller slices for human consumption, we used a variant of intra-procedural thin slicing. We report the results for $δ = 0$ which corresponds to exactly matching the vulnerable ground truth lines, and $δ = 3$ which represents the scenario of identifying the vulnerable area. For both cases, we found that slicing helps in 'finding something' (the participant has found at least some vulnerable lines) as opposed to 'finding nothing'. For the case of $δ = 0$ analyzing a slice and analyzing the original file are statistically equivalent from the perspective of lines found by those who found something. With $δ = 3$ slicing helps to find more vulnerabilities compared to analyzing an original file, as we would normally expect. Given the type of population, additional experiments are necessary to be generalized to experienced developers.

查看原文本刊更多论文

论程序切片在代码检查过程中漏洞检测中的作用。

切片是一种支持调试和程序理解的故障定位技术。然而，它在人类代码检查中的经验有效性受到了有限的关注。我们研究的目的是双重的。首先，我们的目标是定义代码审查者正确识别易受攻击的行意味着什么。其次，我们研究了通过方法级切片减少待检查行的数量是否支持代码审查者检测安全漏洞。我们提出了一种基于δ邻域概念的新方法（直观地基于命令git diff的上下文大小的想法）来定义正确识别的行。然后，我们进行了一项多年的对照实验（2017-2023），其中参加安全课程的理学硕士学生（n = 236）的任务是识别来自Apache Tomcat的原始或切片Java文件中的脆弱行。我们为切片算法提供了完美的种子线，以控制混杂因素。每种处理方法都不同（漏洞，原始/切片），并采用平衡设计，其中包含OWASP 2017年十大漏洞：A1（注入），A5（破坏访问控制），A6（安全错误配置）和A7（跨站点脚本）。为了生成供人类食用的更小的切片，我们使用了程序内薄切片的一种变体。我们报告了δ = 0和δ = 3的结果，δ = 0对应于脆弱地面真实线的精确匹配，δ = 3代表识别脆弱区域的场景。在这两种情况下，我们都发现切片有助于“找到一些东西”（参与者至少找到了一些脆弱的线条），而不是“什么都找不到”。对于δ = 0的情况，从发现某些东西的人发现的行的角度来看，分析切片和分析原始文件在统计上是相等的。与分析原始文件相比，δ = 3切片有助于发现更多漏洞，正如我们通常所期望的那样。考虑到人口的类型，有必要向有经验的开发人员推广额外的实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Empirical Software Engineering 工程技术-计算机：软件工程

CiteScore

8.50

自引率

12.20%

发文量

169

审稿时长

>12 weeks

期刊介绍： Empirical Software Engineering provides a forum for applied software engineering research with a strong empirical component, and a venue for publishing empirical results relevant to both researchers and practitioners. Empirical studies presented here usually involve the collection and analysis of data and experience that can be used to characterize, evaluate and reveal relationships between software development deliverables, practices, and technologies. Over time, it is expected that such empirical results will form a body of knowledge leading to widely accepted and well-formed theories. The journal also offers industrial experience reports detailing the application of software technologies - processes, methods, or tools - and their effectiveness in industrial settings. Empirical Software Engineering promotes the publication of industry-relevant research, to address the significant gap between research and practice.