Stratifying large software files to improve prediction performance in software defect prediction

Y. Alshehri, Noha Alnazzawi, Haneen Hijazi, Rawan K. Alharbi
{"title":"Stratifying large software files to improve prediction performance in software defect prediction","authors":"Y. Alshehri, Noha Alnazzawi, Haneen Hijazi, Rawan K. Alharbi","doi":"10.1145/3543895.3543924","DOIUrl":null,"url":null,"abstract":"Size is one of the significant factors associated with bugs, and it has been used to predict software faults. We believe that stratifying software files based on size can play an essential role in improving prediction performance. This study explored the effect of size by stratifying our sample based on each unit’s size and distributing software units in multiple stratified groups based on an equal distribution approach. We stratified the Eclipse Europa project files, and we reported the performance of each stratified group and compared them. We used two popular classifiers, decision tree J48, and random forest, to implement this experiment. These classifiers presented similar results on the same group of files. The results indicated that predicting faults with large files is better than predicting those in small files. In addition, the results showed higher median values of all performance measures and less variation in each measure.","PeriodicalId":191129,"journal":{"name":"Proceedings of the 9th International Conference on Applied Computing & Information Technology","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Applied Computing & Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543895.3543924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Size is one of the significant factors associated with bugs, and it has been used to predict software faults. We believe that stratifying software files based on size can play an essential role in improving prediction performance. This study explored the effect of size by stratifying our sample based on each unit’s size and distributing software units in multiple stratified groups based on an equal distribution approach. We stratified the Eclipse Europa project files, and we reported the performance of each stratified group and compared them. We used two popular classifiers, decision tree J48, and random forest, to implement this experiment. These classifiers presented similar results on the same group of files. The results indicated that predicting faults with large files is better than predicting those in small files. In addition, the results showed higher median values of all performance measures and less variation in each measure.
对大型软件文件进行分层以提高软件缺陷预测中的预测性能
大小是与错误相关的重要因素之一,它已被用于预测软件故障。我们认为,基于大小对软件文件进行分层可以在提高预测性能方面发挥重要作用。本研究通过基于每个单元的大小对样本进行分层,并基于均匀分布的方法将软件单元分布在多个分层组中,来探索大小的影响。我们对Eclipse Europa项目文件进行了分层,并报告了每个分层组的性能并对它们进行了比较。我们使用了两个流行的分类器,决策树J48和随机森林来实现这个实验。这些分类器在同一组文件上给出了相似的结果。结果表明,对大文件故障的预测优于对小文件故障的预测。此外,结果显示,所有绩效指标的中位数较高,每个指标的变化较小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信