Feature transformation for improved software bug detection and commit classification

IF 3.7 2区计算机科学 Q1 COMPUTER SCIENCE, SOFTWARE ENGINEERING

Journal of Systems and Software Pub Date : 2024-09-06 DOI:10.1016/j.jss.2024.112205

Sakib Mostafa, Shamse Tasnim Cynthia, Banani Roy, Debajyoti Mondal

{"title":"Feature transformation for improved software bug detection and commit classification","authors":"Sakib Mostafa, Shamse Tasnim Cynthia, Banani Roy, Debajyoti Mondal","doi":"10.1016/j.jss.2024.112205","DOIUrl":null,"url":null,"abstract":"<div><p>Testing and debugging software to fix bugs is considered one of the most important stages of the software life cycle. Many studies have investigated ways to predict bugs in software artifacts using machine learning techniques. It is important to consider the explanatory aspects of such models for reliable prediction. In this paper, we show how feature transformation can significantly improve prediction accuracy and provide insight into the inner workings of bug prediction models. We propose a new approach for bug prediction that first extracts the features, then finds a weighted transformation of these features using a genetic algorithm that best separates bugs from non-bugs when plotted in a low-dimensional space, and finally, trains predictive models using the transformed dataset. In our experiment using the proposed feature transformation, the traditional machine learning and deep learning classifiers achieved an average improvement of 4.25% and 9.6% in recall values for bug classification over 8 software systems compared to the models built on original data. We also examined the generalizability of our concept for multiclass classification tasks such as commit classification in software systems and found modest improvements in F1-scores (sometimes up to 3%) for traditional machine learning models and 4% with deep learning models.</p></div>","PeriodicalId":51099,"journal":{"name":"Journal of Systems and Software","volume":"219 ","pages":"Article 112205"},"PeriodicalIF":3.7000,"publicationDate":"2024-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0164121224002498/pdfft?md5=24be736d13c3422f3ae6248d88baf8da&pid=1-s2.0-S0164121224002498-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Systems and Software","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0164121224002498","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}

引用次数: 0

Abstract

Testing and debugging software to fix bugs is considered one of the most important stages of the software life cycle. Many studies have investigated ways to predict bugs in software artifacts using machine learning techniques. It is important to consider the explanatory aspects of such models for reliable prediction. In this paper, we show how feature transformation can significantly improve prediction accuracy and provide insight into the inner workings of bug prediction models. We propose a new approach for bug prediction that first extracts the features, then finds a weighted transformation of these features using a genetic algorithm that best separates bugs from non-bugs when plotted in a low-dimensional space, and finally, trains predictive models using the transformed dataset. In our experiment using the proposed feature transformation, the traditional machine learning and deep learning classifiers achieved an average improvement of 4.25% and 9.6% in recall values for bug classification over 8 software systems compared to the models built on original data. We also examined the generalizability of our concept for multiclass classification tasks such as commit classification in software systems and found modest improvements in F1-scores (sometimes up to 3%) for traditional machine learning models and 4% with deep learning models.

查看原文本刊更多论文

改进软件错误检测和提交分类的特征转换

测试和调试软件以修复错误被认为是软件生命周期中最重要的阶段之一。许多研究都在探讨如何利用机器学习技术预测软件工件中的错误。要进行可靠的预测，必须考虑这些模型的解释性方面。在本文中，我们展示了特征转换如何显著提高预测准确性，并深入探讨了错误预测模型的内部工作原理。我们提出了一种新的错误预测方法，该方法首先提取特征，然后使用遗传算法对这些特征进行加权变换，在低维空间中绘制出最能区分错误与非错误的图像，最后使用变换后的数据集训练预测模型。在我们使用所提出的特征转换进行的实验中，与基于原始数据构建的模型相比，传统机器学习和深度学习分类器在 8 个软件系统的错误分类召回值上平均提高了 4.25% 和 9.6%。我们还检验了我们的概念在多类分类任务（如软件系统中的提交分类）中的通用性，发现传统机器学习模型的 F1 分数略有提高（有时可达 3%），而深度学习模型的 F1 分数提高了 4%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Systems and Software 工程技术-计算机：理论方法

CiteScore

8.60

自引率

5.70%

发文量

193

审稿时长

16 weeks

期刊介绍： The Journal of Systems and Software publishes papers covering all aspects of software engineering and related hardware-software-systems issues. All articles should include a validation of the idea presented, e.g. through case studies, experiments, or systematic comparisons with other approaches already in practice. Topics of interest include, but are not limited to: •Methods and tools for, and empirical studies on, software requirements, design, architecture, verification and validation, maintenance and evolution •Agile, model-driven, service-oriented, open source and global software development •Approaches for mobile, multiprocessing, real-time, distributed, cloud-based, dependable and virtualized systems •Human factors and management concerns of software development •Data management and big data issues of software systems •Metrics and evaluation, data mining of software development resources •Business and economic aspects of software development processes The journal welcomes state-of-the-art surveys and reports of practical experience for all of these topics.