Jiaxin Chen;Jinliang Ding;Kay Chen Tan;Jiancheng Qian;Ke Li
{"title":"MBL-CPDP:跨项目缺陷预测的多目标双层方法","authors":"Jiaxin Chen;Jinliang Ding;Kay Chen Tan;Jiancheng Qian;Ke Li","doi":"10.1109/TSE.2025.3577808","DOIUrl":null,"url":null,"abstract":"Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce. However, existing CPDP approaches suffer from three critical limitations: ineffective exploration of high-dimensional parameter spaces, poor adaptability across diverse projects with heterogeneous data distributions, and inadequate handling of feature redundancy and distribution discrepancies between source and target projects. To address these challenges, we formulate CPDP as a multi-objective bilevel optimization (MBLO) method, dubbed <monospace>MBL-CPDP</monospace>. Our approach comprises two nested problems: the upper-level, a multi-objective combinatorial optimization problem, enhances robustness by optimizing ML pipelines that integrate feature selection, transfer learning, and classification techniques, while the lower-level problem fine-tunes their hyperparameters. Unlike traditional methods that employ fragmented optimization strategies or single-objective approaches that introduce bias, <monospace>MBL-CPDP</monospace> provides a holistic, end-to-end optimization framework. Additionally, we propose an ensemble learning method to better capture cross-project distribution differences and improve generalization across diverse datasets. An MBLO algorithm is then presented to effectively solve the formulated MBLO problem. To evaluate <monospace>MBL-CPDP</monospace>’s performance, we compare it with five automated ML tools and 50 CPDP techniques across 20 projects. Extensive empirical results show that <monospace>MBL-CPDP</monospace> outperforms the comparison methods, demonstrating its superior adaptability and comprehensive performance evaluation capability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2305-2328"},"PeriodicalIF":5.6000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"MBL-CPDP: A Multi-Objective Bilevel Method for Cross-Project Defect Prediction\",\"authors\":\"Jiaxin Chen;Jinliang Ding;Kay Chen Tan;Jiancheng Qian;Ke Li\",\"doi\":\"10.1109/TSE.2025.3577808\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce. However, existing CPDP approaches suffer from three critical limitations: ineffective exploration of high-dimensional parameter spaces, poor adaptability across diverse projects with heterogeneous data distributions, and inadequate handling of feature redundancy and distribution discrepancies between source and target projects. To address these challenges, we formulate CPDP as a multi-objective bilevel optimization (MBLO) method, dubbed <monospace>MBL-CPDP</monospace>. Our approach comprises two nested problems: the upper-level, a multi-objective combinatorial optimization problem, enhances robustness by optimizing ML pipelines that integrate feature selection, transfer learning, and classification techniques, while the lower-level problem fine-tunes their hyperparameters. Unlike traditional methods that employ fragmented optimization strategies or single-objective approaches that introduce bias, <monospace>MBL-CPDP</monospace> provides a holistic, end-to-end optimization framework. Additionally, we propose an ensemble learning method to better capture cross-project distribution differences and improve generalization across diverse datasets. An MBLO algorithm is then presented to effectively solve the formulated MBLO problem. To evaluate <monospace>MBL-CPDP</monospace>’s performance, we compare it with five automated ML tools and 50 CPDP techniques across 20 projects. Extensive empirical results show that <monospace>MBL-CPDP</monospace> outperforms the comparison methods, demonstrating its superior adaptability and comprehensive performance evaluation capability.\",\"PeriodicalId\":13324,\"journal\":{\"name\":\"IEEE Transactions on Software Engineering\",\"volume\":\"51 8\",\"pages\":\"2305-2328\"},\"PeriodicalIF\":5.6000,\"publicationDate\":\"2025-06-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Software Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11029502/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11029502/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
MBL-CPDP: A Multi-Objective Bilevel Method for Cross-Project Defect Prediction
Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce. However, existing CPDP approaches suffer from three critical limitations: ineffective exploration of high-dimensional parameter spaces, poor adaptability across diverse projects with heterogeneous data distributions, and inadequate handling of feature redundancy and distribution discrepancies between source and target projects. To address these challenges, we formulate CPDP as a multi-objective bilevel optimization (MBLO) method, dubbed MBL-CPDP. Our approach comprises two nested problems: the upper-level, a multi-objective combinatorial optimization problem, enhances robustness by optimizing ML pipelines that integrate feature selection, transfer learning, and classification techniques, while the lower-level problem fine-tunes their hyperparameters. Unlike traditional methods that employ fragmented optimization strategies or single-objective approaches that introduce bias, MBL-CPDP provides a holistic, end-to-end optimization framework. Additionally, we propose an ensemble learning method to better capture cross-project distribution differences and improve generalization across diverse datasets. An MBLO algorithm is then presented to effectively solve the formulated MBLO problem. To evaluate MBL-CPDP’s performance, we compare it with five automated ML tools and 50 CPDP techniques across 20 projects. Extensive empirical results show that MBL-CPDP outperforms the comparison methods, demonstrating its superior adaptability and comprehensive performance evaluation capability.
期刊介绍:
IEEE Transactions on Software Engineering seeks contributions comprising well-defined theoretical results and empirical studies with potential impacts on software construction, analysis, or management. The scope of this Transactions extends from fundamental mechanisms to the development of principles and their application in specific environments. Specific topic areas include:
a) Development and maintenance methods and models: Techniques and principles for specifying, designing, and implementing software systems, encompassing notations and process models.
b) Assessment methods: Software tests, validation, reliability models, test and diagnosis procedures, software redundancy, design for error control, and measurements and evaluation of process and product aspects.
c) Software project management: Productivity factors, cost models, schedule and organizational issues, and standards.
d) Tools and environments: Specific tools, integrated tool environments, associated architectures, databases, and parallel and distributed processing issues.
e) System issues: Hardware-software trade-offs.
f) State-of-the-art surveys: Syntheses and comprehensive reviews of the historical development within specific areas of interest.