{"title":"基于主动学习的项目内软件老化缺陷预测","authors":"Mengting Liang, Dimeng Li, Bin Xu, Dongdong Zhao, Xiao Yu, Jianwen Xiang","doi":"10.1109/ISSREW53611.2021.00037","DOIUrl":null,"url":null,"abstract":"Long-running software systems tend to exhibit performance degradation and increase failure rate, and the phenomenon is known as software aging. The bugs that cause the aging phenomenon are called Aging-Related Bugs (ARBs), and may bring serious economic loss or even endanger human security. To discover and remove ARBs, ARBs prediction is presented. But ARBs prediction model often needs a large number of training data in order to train a high performance classification model. In practice, the labeled data are rare in many cases. In addition, it is difficult to label all samples manually. Furthermore, there is a serious class imbalance problem in ARBs datasets. In order to address the two problems, we propose a framework named QUIRE-HUE. On the one hand, we use a approach named Active Learning by Querying Informative and Representative Examples (QUIRE) to select a few informative and representative samples to label for training set, which can reduce the cost of labeling and get a high performance classification model. On the other hand, we apply a Hashing-Based Undersampling Ensemble (HUE) by constructing diversified training subspaces for undersampling to alleviate class imbalance problem. A set of experiments are performed on two large open-source projects (MySQL, Linux) with six different machine learning classifiers. We use Balance and AUC as the evaluation metrics. Experimental results indicate that QUIRE-HUE achieves encouraging results. Average AUC and Balance are 0.769 and 0.812 respectively on MySQL dataset, 0.772 and 0.828 respectively on Linux dataset, which significantly outperforms all baseline methods.","PeriodicalId":385392,"journal":{"name":"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Within-Project Software Aging Defect Prediction Based on Active Learning\",\"authors\":\"Mengting Liang, Dimeng Li, Bin Xu, Dongdong Zhao, Xiao Yu, Jianwen Xiang\",\"doi\":\"10.1109/ISSREW53611.2021.00037\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Long-running software systems tend to exhibit performance degradation and increase failure rate, and the phenomenon is known as software aging. The bugs that cause the aging phenomenon are called Aging-Related Bugs (ARBs), and may bring serious economic loss or even endanger human security. To discover and remove ARBs, ARBs prediction is presented. But ARBs prediction model often needs a large number of training data in order to train a high performance classification model. In practice, the labeled data are rare in many cases. In addition, it is difficult to label all samples manually. Furthermore, there is a serious class imbalance problem in ARBs datasets. In order to address the two problems, we propose a framework named QUIRE-HUE. On the one hand, we use a approach named Active Learning by Querying Informative and Representative Examples (QUIRE) to select a few informative and representative samples to label for training set, which can reduce the cost of labeling and get a high performance classification model. On the other hand, we apply a Hashing-Based Undersampling Ensemble (HUE) by constructing diversified training subspaces for undersampling to alleviate class imbalance problem. A set of experiments are performed on two large open-source projects (MySQL, Linux) with six different machine learning classifiers. We use Balance and AUC as the evaluation metrics. Experimental results indicate that QUIRE-HUE achieves encouraging results. Average AUC and Balance are 0.769 and 0.812 respectively on MySQL dataset, 0.772 and 0.828 respectively on Linux dataset, which significantly outperforms all baseline methods.\",\"PeriodicalId\":385392,\"journal\":{\"name\":\"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSREW53611.2021.00037\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSREW53611.2021.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
摘要
长时间运行的软件系统往往表现出性能下降和故障率增加,这种现象被称为软件老化。引起衰老现象的臭虫被称为老化相关臭虫(aging - related bugs, ARBs),可能会带来严重的经济损失甚至危及人类安全。为了发现和去除arb,提出了arb预测方法。但为了训练出高性能的分类模型,arb预测模型往往需要大量的训练数据。在实践中,标注的数据在很多情况下是罕见的。另外,手工标注所有样品是很困难的。此外,arb数据集存在严重的类不平衡问题。为了解决这两个问题,我们提出了一个名为QUIRE-HUE的框架。一方面,我们采用基于查询信息和代表性示例的主动学习方法(QUIRE),选择少量具有信息和代表性的样本对训练集进行标记,从而降低标记成本,获得高性能的分类模型。另一方面,我们采用基于哈希的欠采样集成(HUE),通过构建不同的欠采样训练子空间来缓解类不平衡问题。在两个大型开源项目(MySQL, Linux)上使用六种不同的机器学习分类器进行了一组实验。我们使用Balance和AUC作为评估指标。实验结果表明,QUIRE-HUE取得了令人鼓舞的效果。MySQL数据集的平均AUC和Balance分别为0.769和0.812,Linux数据集的平均AUC和Balance分别为0.772和0.828,显著优于所有基线方法。
Within-Project Software Aging Defect Prediction Based on Active Learning
Long-running software systems tend to exhibit performance degradation and increase failure rate, and the phenomenon is known as software aging. The bugs that cause the aging phenomenon are called Aging-Related Bugs (ARBs), and may bring serious economic loss or even endanger human security. To discover and remove ARBs, ARBs prediction is presented. But ARBs prediction model often needs a large number of training data in order to train a high performance classification model. In practice, the labeled data are rare in many cases. In addition, it is difficult to label all samples manually. Furthermore, there is a serious class imbalance problem in ARBs datasets. In order to address the two problems, we propose a framework named QUIRE-HUE. On the one hand, we use a approach named Active Learning by Querying Informative and Representative Examples (QUIRE) to select a few informative and representative samples to label for training set, which can reduce the cost of labeling and get a high performance classification model. On the other hand, we apply a Hashing-Based Undersampling Ensemble (HUE) by constructing diversified training subspaces for undersampling to alleviate class imbalance problem. A set of experiments are performed on two large open-source projects (MySQL, Linux) with six different machine learning classifiers. We use Balance and AUC as the evaluation metrics. Experimental results indicate that QUIRE-HUE achieves encouraging results. Average AUC and Balance are 0.769 and 0.812 respectively on MySQL dataset, 0.772 and 0.828 respectively on Linux dataset, which significantly outperforms all baseline methods.