Automated Parameter Optimization of Classification Techniques for Defect Prediction Models

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE) Pub Date : 2016-05-14 DOI:10.1145/2884781.2884857

C. Tantithamthavorn, Shane McIntosh, A. Hassan, Ken-ichi Matsumoto

{"title":"Automated Parameter Optimization of Classification Techniques for Defect Prediction Models","authors":"C. Tantithamthavorn, Shane McIntosh, A. Hassan, Ken-ichi Matsumoto","doi":"10.1145/2884781.2884857","DOIUrl":null,"url":null,"abstract":"Defect prediction models are classifiers that are trained to identify defect-prone software modules. Such classifiers have configurable parameters that control their characteristics (e.g., the number of trees in a random forest classifier). Recent studies show that these classifiers may underperform due to the use of suboptimal default parameter settings. However, it is impractical to assess all of the possible settings in the parameter spaces. In this paper, we investigate the performance of defect prediction models where Caret — an automated parameter optimization technique - has been applied. Through a case study of 18 datasets from systems that span both proprietary and open source domains, we find that (1) Caret improves the AUC performance of defect prediction models by as much as 40 percentage points; (2) Caret-optimized classifiers are at least as stable as (with 35% of them being more stable than) classifiers that are trained using the default settings; and (3) Caret increases the likelihood of producing a top-performing classifier by as much as 83%. Hence, we conclude that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques. Since automated parameter optimization techniques like Caret yield substantially benefits in terms of performance improvement and stability, while incurring a manageable additional computational cost, they should be included in future defect prediction studies.","PeriodicalId":6485,"journal":{"name":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","volume":"49 1","pages":"321-332"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"309","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2884781.2884857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 309

Abstract

Defect prediction models are classifiers that are trained to identify defect-prone software modules. Such classifiers have configurable parameters that control their characteristics (e.g., the number of trees in a random forest classifier). Recent studies show that these classifiers may underperform due to the use of suboptimal default parameter settings. However, it is impractical to assess all of the possible settings in the parameter spaces. In this paper, we investigate the performance of defect prediction models where Caret — an automated parameter optimization technique - has been applied. Through a case study of 18 datasets from systems that span both proprietary and open source domains, we find that (1) Caret improves the AUC performance of defect prediction models by as much as 40 percentage points; (2) Caret-optimized classifiers are at least as stable as (with 35% of them being more stable than) classifiers that are trained using the default settings; and (3) Caret increases the likelihood of producing a top-performing classifier by as much as 83%. Hence, we conclude that parameter settings can indeed have a large impact on the performance of defect prediction models, suggesting that researchers should experiment with the parameters of the classification techniques. Since automated parameter optimization techniques like Caret yield substantially benefits in terms of performance improvement and stability, while incurring a manageable additional computational cost, they should be included in future defect prediction studies.

查看原文本刊更多论文

缺陷预测模型的自动参数优化分类技术

缺陷预测模型是经过训练以识别容易出现缺陷的软件模块的分类器。这样的分类器具有控制其特征的可配置参数(例如，随机森林分类器中的树的数量)。最近的研究表明，由于使用了次优的默认参数设置，这些分类器可能表现不佳。然而，评估参数空间中所有可能的设置是不切实际的。本文研究了采用自动参数优化技术Caret的缺陷预测模型的性能。通过对来自私有和开源领域的系统的18个数据集的案例研究，我们发现(1)Caret将缺陷预测模型的AUC性能提高了40个百分点;(2)插入符优化的分类器至少与使用默认设置训练的分类器一样稳定(其中35%的分类器比前者更稳定);(3)插入符号使生成性能最好的分类器的可能性提高了83%。因此，我们得出结论，参数设置确实会对缺陷预测模型的性能产生很大的影响，建议研究人员应该对分类技术的参数进行实验。由于像Caret这样的自动参数优化技术在性能改进和稳定性方面产生了实质性的好处，同时产生了可管理的额外计算成本，因此它们应该包括在未来的缺陷预测研究中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE)

自引率

0.00%

发文量