BiLO-CPDP: Bi-Level Programming for Automated Model Discovery in Cross-Project Defect Prediction

2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE) Pub Date : 2020-08-31 DOI:10.1145/3324884.3416617

Kewei Li, Zilin Xiang, Tao-An Chen, K. Tan

{"title":"BiLO-CPDP: Bi-Level Programming for Automated Model Discovery in Cross-Project Defect Prediction","authors":"Kewei Li, Zilin Xiang, Tao-An Chen, K. Tan","doi":"10.1145/3324884.3416617","DOIUrl":null,"url":null,"abstract":"Cross-Project Defect Prediction (CPDP), which borrows data from similar projects by combining a transfer learner with a classifier, have emerged as a promising way to predict software defects when the available data about the target project is insufficient. However, developing such a model is challenge because it is difficult to determine the right combination of transfer learner and classifier along with their optimal hyper-parameter settings. In this paper, we propose a tool, dubbed BiLO-CPDP, which is the first of its kind to formulate the automated CPDP model discovery from the perspective of bi-level programming. In particular, the bi-level programming proceeds the optimization with two nested levels in a hierarchical manner. Specifically, the upper-level optimization routine is designed to search for the right combination of transfer learner and classifier while the nested lower-level optimization routine aims to optimize the corresponding hyper-parameter settings. To evaluate BiLO-CPDP, we conduct experiments on 20 projects to compare it with a total of 21 existing CPDP techniques, along with its single-level optimization variant and Auto-Sklearn, a state-of-the-art automated machine learning tool. Empirical results show that BiLO-CPDP champions better prediction performance than all other 21 existing CPDP techniques on 70% of the projects, while being overwhelmingly superior to Auto-Sklearn and its single-level optimization variant on all cases. Furthermore, the unique bi-level formalization in BiLO-CPDP also permits to allocate more budget to the upper-level, which significantly boosts the performance.","PeriodicalId":106337,"journal":{"name":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3324884.3416617","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

Cross-Project Defect Prediction (CPDP), which borrows data from similar projects by combining a transfer learner with a classifier, have emerged as a promising way to predict software defects when the available data about the target project is insufficient. However, developing such a model is challenge because it is difficult to determine the right combination of transfer learner and classifier along with their optimal hyper-parameter settings. In this paper, we propose a tool, dubbed BiLO-CPDP, which is the first of its kind to formulate the automated CPDP model discovery from the perspective of bi-level programming. In particular, the bi-level programming proceeds the optimization with two nested levels in a hierarchical manner. Specifically, the upper-level optimization routine is designed to search for the right combination of transfer learner and classifier while the nested lower-level optimization routine aims to optimize the corresponding hyper-parameter settings. To evaluate BiLO-CPDP, we conduct experiments on 20 projects to compare it with a total of 21 existing CPDP techniques, along with its single-level optimization variant and Auto-Sklearn, a state-of-the-art automated machine learning tool. Empirical results show that BiLO-CPDP champions better prediction performance than all other 21 existing CPDP techniques on 70% of the projects, while being overwhelmingly superior to Auto-Sklearn and its single-level optimization variant on all cases. Furthermore, the unique bi-level formalization in BiLO-CPDP also permits to allocate more budget to the upper-level, which significantly boosts the performance.

查看原文本刊更多论文

BiLO-CPDP:跨项目缺陷预测中自动模型发现的双层编程

跨项目缺陷预测(CPDP)，通过结合迁移学习器和分类器从类似的项目中借用数据，已经成为当目标项目的可用数据不足时预测软件缺陷的一种有前途的方法。然而，开发这样的模型是一个挑战，因为很难确定迁移学习器和分类器的正确组合以及它们的最佳超参数设置。在本文中，我们提出了一个称为BiLO-CPDP的工具，这是同类工具中第一个从双层规划的角度制定自动CPDP模型发现的工具。特别是，双层规划以分层的方式对两个嵌套层进行优化。其中，上层优化例程的目的是寻找迁移学习器和分类器的正确组合，而嵌套下层优化例程的目的是优化相应的超参数设置。为了评估BiLO-CPDP，我们对20个项目进行了实验，将其与现有的21种CPDP技术，以及其单级优化变体和Auto-Sklearn(一种最先进的自动化机器学习工具)进行了比较。实证结果表明，在70%的项目中，BiLO-CPDP的预测性能优于所有其他21种现有的CPDP技术，同时在所有情况下都绝对优于Auto-Sklearn及其单级优化变体。此外，BiLO-CPDP中独特的双层形式化也允许将更多的预算分配给上层，这大大提高了性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 35th IEEE/ACM International Conference on Automated Software Engineering (ASE)

自引率

0.00%

发文量