利用源代码度量和静态分析估算错误严重性的实证研究

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2024-08-06 DOI:10.1016/j.jss.2024.112179

Ehsan Mashhadi , Shaiful Chowdhury , Somayeh Modaberi , Hadi Hemmati , Gias Uddin

{"title":"利用源代码度量和静态分析估算错误严重性的实证研究","authors":"Ehsan Mashhadi , Shaiful Chowdhury , Somayeh Modaberi , Hadi Hemmati , Gias Uddin","doi":"10.1016/j.jss.2024.112179","DOIUrl":null,"url":null,"abstract":"<div><p>In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes, methods, lines, or commits are buggy. However, most existing work in this domain treats all bugs the same, which is not the case in practice. The more severe the bugs the higher their consequences. Therefore, it is important for a defect prediction method to estimate the severity of the identified bugs, so that the higher severity ones get immediate attention. In this paper, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and two popular static analysis tools (SpotBugs and Infer) for analyzing their capability to predict defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are useful in predicting buggy code (Lines of the Code, Maintainable Index, FanOut, and Effort metrics are the best), they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%–7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools in estimating their severity. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Finally, we discuss the practical implications of the results and propose new directions for future research.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"217 ","pages":"Article 112179"},"PeriodicalIF":3.7000,"publicationDate":"2024-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224002243/pdfft?md5=a39163bbaede6f3d8ae30da169d41fa3&pid=1-s2.0-S0164121224002243-main.pdf","citationCount":"0","resultStr":"{\"title\":\"An empirical study on bug severity estimation using source code metrics and static analysis\",\"authors\":\"Ehsan Mashhadi , Shaiful Chowdhury , Somayeh Modaberi , Hadi Hemmati , Gias Uddin\",\"doi\":\"10.1016/j.jss.2024.112179\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes, methods, lines, or commits are buggy. However, most existing work in this domain treats all bugs the same, which is not the case in practice. The more severe the bugs the higher their consequences. Therefore, it is important for a defect prediction method to estimate the severity of the identified bugs, so that the higher severity ones get immediate attention. In this paper, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and two popular static analysis tools (SpotBugs and Infer) for analyzing their capability to predict defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are useful in predicting buggy code (Lines of the Code, Maintainable Index, FanOut, and Effort metrics are the best), they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%–7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools in estimating their severity. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Finally, we discuss the practical implications of the results and propose new directions for future research.</p></div>\",\"PeriodicalId\":51099,\"journal\":{\"name\":\"Journal of Systems and Software\",\"volume\":\"217 \",\"pages\":\"Article 112179\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2024-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S0164121224002243/pdfft?md5=a39163bbaede6f3d8ae30da169d41fa3&pid=1-s2.0-S0164121224002243-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Systems and Software\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0164121224002243\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224002243","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

摘要

在过去的几十年里，大量的研究工作都致力于软件缺陷（即缺陷）的预测。一般来说，这些研究利用各种指标、工具和技术来预测哪些类、方法、行或提交存在缺陷。然而，该领域的大多数现有工作都对所有错误一视同仁，而实际情况并非如此。错误越严重，后果越严重。因此，对于缺陷预测方法来说，重要的是要估算出已识别错误的严重程度，以便让严重程度较高的错误立即得到关注。在本文中，我们对两个流行的数据集（Defects4J 和 Bugs.jar）进行了定量和定性研究，使用了 10 个常见的源代码度量指标和两个流行的静态分析工具（SpotBugs 和 Infer），以分析它们预测缺陷及其严重性的能力。我们研究了来自 19 个 Java 开源项目的 3,358 个具有不同严重性标签的缺陷方法。结果表明，虽然代码度量在预测代码缺陷方面很有用（代码行数、可维护指数、FanOut 和努力度量是最好的），但它们无法估计缺陷的严重程度。此外，我们还观察到静态分析工具在预测错误（F1 分数范围为 3.1%-7.1%）及其严重性标签（F1 分数低于 2%）方面都表现不佳。我们还人工研究了严重漏洞的特征，以找出代码度量和静态分析工具在估计漏洞严重性方面表现不佳的可能原因。此外，我们的分类显示，安全漏洞在大多数情况下具有较高的严重性，而边缘/边界故障则具有较低的严重性。最后，我们讨论了这些结果的实际意义，并提出了未来研究的新方向。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An empirical study on bug severity estimation using source code metrics and static analysis

In the past couple of decades, significant research efforts have been devoted to the prediction of software bugs (i.e., defects). In general, these works leverage a diverse set of metrics, tools, and techniques to predict which classes, methods, lines, or commits are buggy. However, most existing work in this domain treats all bugs the same, which is not the case in practice. The more severe the bugs the higher their consequences. Therefore, it is important for a defect prediction method to estimate the severity of the identified bugs, so that the higher severity ones get immediate attention. In this paper, we provide a quantitative and qualitative study on two popular datasets (Defects4J and Bugs.jar), using 10 common source code metrics, and two popular static analysis tools (SpotBugs and Infer) for analyzing their capability to predict defects and their severity. We studied 3,358 buggy methods with different severity labels from 19 Java open-source projects. Results show that although code metrics are useful in predicting buggy code (Lines of the Code, Maintainable Index, FanOut, and Effort metrics are the best), they cannot estimate the severity level of the bugs. In addition, we observed that static analysis tools have weak performance in both predicting bugs (F1 score range of 3.1%–7.1%) and their severity label (F1 score under 2%). We also manually studied the characteristics of the severe bugs to identify possible reasons behind the weak performance of code metrics and static analysis tools in estimating their severity. Also, our categorization shows that Security bugs have high severity in most cases while Edge/Boundary faults have low severity. Finally, we discuss the practical implications of the results and propose new directions for future research.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.