Y. Alshehri, Noha Alnazzawi, Haneen Hijazi, Rawan K. Alharbi
{"title":"Stratifying large software files to improve prediction performance in software defect prediction","authors":"Y. Alshehri, Noha Alnazzawi, Haneen Hijazi, Rawan K. Alharbi","doi":"10.1145/3543895.3543924","DOIUrl":null,"url":null,"abstract":"Size is one of the significant factors associated with bugs, and it has been used to predict software faults. We believe that stratifying software files based on size can play an essential role in improving prediction performance. This study explored the effect of size by stratifying our sample based on each unit’s size and distributing software units in multiple stratified groups based on an equal distribution approach. We stratified the Eclipse Europa project files, and we reported the performance of each stratified group and compared them. We used two popular classifiers, decision tree J48, and random forest, to implement this experiment. These classifiers presented similar results on the same group of files. The results indicated that predicting faults with large files is better than predicting those in small files. In addition, the results showed higher median values of all performance measures and less variation in each measure.","PeriodicalId":191129,"journal":{"name":"Proceedings of the 9th International Conference on Applied Computing & Information Technology","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Applied Computing & Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3543895.3543924","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Size is one of the significant factors associated with bugs, and it has been used to predict software faults. We believe that stratifying software files based on size can play an essential role in improving prediction performance. This study explored the effect of size by stratifying our sample based on each unit’s size and distributing software units in multiple stratified groups based on an equal distribution approach. We stratified the Eclipse Europa project files, and we reported the performance of each stratified group and compared them. We used two popular classifiers, decision tree J48, and random forest, to implement this experiment. These classifiers presented similar results on the same group of files. The results indicated that predicting faults with large files is better than predicting those in small files. In addition, the results showed higher median values of all performance measures and less variation in each measure.