{"title":"Within-Project Software Aging Defect Prediction Based on Active Learning","authors":"Mengting Liang, Dimeng Li, Bin Xu, Dongdong Zhao, Xiao Yu, Jianwen Xiang","doi":"10.1109/ISSREW53611.2021.00037","DOIUrl":null,"url":null,"abstract":"Long-running software systems tend to exhibit performance degradation and increase failure rate, and the phenomenon is known as software aging. The bugs that cause the aging phenomenon are called Aging-Related Bugs (ARBs), and may bring serious economic loss or even endanger human security. To discover and remove ARBs, ARBs prediction is presented. But ARBs prediction model often needs a large number of training data in order to train a high performance classification model. In practice, the labeled data are rare in many cases. In addition, it is difficult to label all samples manually. Furthermore, there is a serious class imbalance problem in ARBs datasets. In order to address the two problems, we propose a framework named QUIRE-HUE. On the one hand, we use a approach named Active Learning by Querying Informative and Representative Examples (QUIRE) to select a few informative and representative samples to label for training set, which can reduce the cost of labeling and get a high performance classification model. On the other hand, we apply a Hashing-Based Undersampling Ensemble (HUE) by constructing diversified training subspaces for undersampling to alleviate class imbalance problem. A set of experiments are performed on two large open-source projects (MySQL, Linux) with six different machine learning classifiers. We use Balance and AUC as the evaluation metrics. Experimental results indicate that QUIRE-HUE achieves encouraging results. Average AUC and Balance are 0.769 and 0.812 respectively on MySQL dataset, 0.772 and 0.828 respectively on Linux dataset, which significantly outperforms all baseline methods.","PeriodicalId":385392,"journal":{"name":"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW53611.2021.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Long-running software systems tend to exhibit performance degradation and increase failure rate, and the phenomenon is known as software aging. The bugs that cause the aging phenomenon are called Aging-Related Bugs (ARBs), and may bring serious economic loss or even endanger human security. To discover and remove ARBs, ARBs prediction is presented. But ARBs prediction model often needs a large number of training data in order to train a high performance classification model. In practice, the labeled data are rare in many cases. In addition, it is difficult to label all samples manually. Furthermore, there is a serious class imbalance problem in ARBs datasets. In order to address the two problems, we propose a framework named QUIRE-HUE. On the one hand, we use a approach named Active Learning by Querying Informative and Representative Examples (QUIRE) to select a few informative and representative samples to label for training set, which can reduce the cost of labeling and get a high performance classification model. On the other hand, we apply a Hashing-Based Undersampling Ensemble (HUE) by constructing diversified training subspaces for undersampling to alleviate class imbalance problem. A set of experiments are performed on two large open-source projects (MySQL, Linux) with six different machine learning classifiers. We use Balance and AUC as the evaluation metrics. Experimental results indicate that QUIRE-HUE achieves encouraging results. Average AUC and Balance are 0.769 and 0.812 respectively on MySQL dataset, 0.772 and 0.828 respectively on Linux dataset, which significantly outperforms all baseline methods.