IEEE Transactions on Software Engineering最新文献

筛选
英文 中文
On the Effectiveness of LLM-as-a-Judge for Code Generation and Summarization 法学硕士在代码生成与总结中的有效性研究
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-07-04 DOI: 10.1109/TSE.2025.3586082
Giuseppe Crupi;Rosalia Tufano;Alejandro Velasco;Antonio Mastropaolo;Denys Poshyvanyk;Gabriele Bavota
{"title":"On the Effectiveness of LLM-as-a-Judge for Code Generation and Summarization","authors":"Giuseppe Crupi;Rosalia Tufano;Alejandro Velasco;Antonio Mastropaolo;Denys Poshyvanyk;Gabriele Bavota","doi":"10.1109/TSE.2025.3586082","DOIUrl":"10.1109/TSE.2025.3586082","url":null,"abstract":"Large Language Models (LLMs) have been recently exploited as judges for complex natural language processing tasks, such as Q&A (Question & Answer). The basic idea is to delegate to an LLM the assessment of the “quality” of the output provided by an automated technique (often another LLM) for tasks for which: (i) quantitative metrics would only tell part of the story, and; (ii) a large-scale human-based evaluation would be too expensive. LLMs-as-a-judge, if proven effective for a specific task, can also unlock new possibilities for automation, with several LLMs proposing a solution for a given instance of the task (<i>e.g.,</i> an answer to a question) and others judging and deciding what is the best output to show the user. We study the effectiveness of LLMs-as-a-judge for two code-related tasks, namely <i>code generation</i> and <i>code summarization</i>. The rationale for choosing these tasks is two-fold. First, quantitative metrics are usually not enough for the assessment of code summarizers/generators. For example, it is well documented that metrics such as BLEU are quite weak proxies for the quality of the generated summaries. Second, even state-of-the-art techniques still struggle with handling complex instances of these tasks (<i>e.g.,</i> summarizing a quite long / complex function), making them good candidates for benefiting from more advanced solutions envisioning collaboration among LLMs. For <i>code generation</i>, we check whether eight LLMs are able to judge the correctness of 1,405 Java methods and 1,281 Python functions generated by the same LLMs or implemented by humans. For <i>code summarization</i>, we compare the judgment of five LLMs to those provided by ninehumans for <inline-formula><tex-math>$sim$</tex-math></inline-formula> 1.2k summaries, related to both Java and Python functions. Our findings show that GPT-4-turbo is the best LLM in terms of judging capabilities for both tasks, with “smaller” LLMs featuring tens of billions parameters not being able to cope with judging tasks. However, even the best-performing LLM frequently misjudges the correctness of the code and summary quality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2329-2345"},"PeriodicalIF":5.6,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TransferFuzz-Pro: Large Language Model Driven Code Debugging Technology for Verifying Propagated Vulnerability transferfuzzy - pro:用于验证传播漏洞的大语言模型驱动代码调试技术
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-07-03 DOI: 10.1109/TSE.2025.3584774
Siyuan Li;Kaiyu Xie;Yuekang Li;Hong Li;Yimo Ren;Limin Sun;Hongsong Zhu
{"title":"TransferFuzz-Pro: Large Language Model Driven Code Debugging Technology for Verifying Propagated Vulnerability","authors":"Siyuan Li;Kaiyu Xie;Yuekang Li;Hong Li;Yimo Ren;Limin Sun;Hongsong Zhu","doi":"10.1109/TSE.2025.3584774","DOIUrl":"10.1109/TSE.2025.3584774","url":null,"abstract":"Code reuse in software development frequently facilitates the spread of vulnerabilities, leading to imprecise scopes of affected software in CVE reports. Traditional methods focus primarily on detecting reused vulnerability code in target software but lack the ability to confirm whether these vulnerabilities can be triggered in new software contexts. In previous work, we introduced the TransferFuzz framework to address this gap by using historical trace-based fuzzing. However, its effectiveness is constrained by the need for manual intervention and reliance on source code instrumentation. To overcome these limitations, we propose TransferFuzz-Pro, a novel framework that integrates Large Language Model (LLM)-driven code debugging technology. By leveraging LLM for automated, human-like debugging and Proof-of-Concept (PoC) generation, combined with binary-level instrumentation, TransferFuzz-Pro extends verification capabilities to a wider range of targets. Our evaluation shows that TransferFuzz-Pro is significantly faster and can automatically validate vulnerabilities that were previously unverifiable using conventional methods. Notably, it expands the number of affected software instances for 15 CVE-listed vulnerabilities from 15 to 53 and successfully generates PoCs for various Linux distributions. These results demonstrate that TransferFuzz-Pro effectively verifies vulnerabilities introduced by code reuse in target software and automatically generation PoCs.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2396-2411"},"PeriodicalIF":5.6,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144565937","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COTE: Predicting Code-to-Test Co-Evolution by Integrating Link Analysis and Pre-Trained Language Model Techniques 通过集成链接分析和预训练语言模型技术预测代码到测试的共同进化
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-27 DOI: 10.1109/TSE.2025.3583027
Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song
{"title":"COTE: Predicting Code-to-Test Co-Evolution by Integrating Link Analysis and Pre-Trained Language Model Techniques","authors":"Yuyong Liu;Zhifei Chen;Lin Chen;Yanhui Li;Xuansong Li;Wei Song","doi":"10.1109/TSE.2025.3583027","DOIUrl":"10.1109/TSE.2025.3583027","url":null,"abstract":"Tests, as an essential artifact, should co-evolve with the production code to ensure that the associated production code satisfies specification. However, developers often postpone or even forget to update tests, making the tests outdated and lag behind the code. To predict which tests need to be updated when production code is changed, it is challenging to identify all related tests and determine their change probabilities due to complex change scenarios. This paper fills the gap and proposes a hybrid approach named COTE to predict code-to-test co-evolution. We first compute the linked test candidates based on different code-to-test dependencies. After that, we identify common co-change patterns by building a method-level dependence graph. For the remaining ambiguous patterns, we leverage a pre-trained language model which captures the semantic features of code and the change reasons contained in commit messages to judge one test’s likelihood of being updated. Experiments on our datasets consisting of 6,314 samples extracted from 5,000 Java projects show that COTE outperforms state-of-the-art approaches, achieving a precision of 89.0% and a recall of 71.6%. This work can help practitioners reduce test maintenance costs and improve software quality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2232-2253"},"PeriodicalIF":5.6,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144503609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OneMoreTest: A Learning-Based Approach to Generating and Selecting Fault-Revealing Unit Tests OneMoreTest:生成和选择故障揭示单元测试的基于学习的方法
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-25 DOI: 10.1109/TSE.2025.3581556
Wei Wei;Yanjie Jiang;Yahui Li;Lu Zhang;Hui Liu
{"title":"OneMoreTest: A Learning-Based Approach to Generating and Selecting Fault-Revealing Unit Tests","authors":"Wei Wei;Yanjie Jiang;Yahui Li;Lu Zhang;Hui Liu","doi":"10.1109/TSE.2025.3581556","DOIUrl":"10.1109/TSE.2025.3581556","url":null,"abstract":"Developers often manually design a few unit tests for a given method under development. After passing such manually designed tests, however, they usually have to turn to automated test case generation tools like EvoSuite and Randoop for more thorough testing. Although the automatically generated tests may achieve a high coverage, they rarely identify hard-to-detect defects automatically because of the well-known test oracle problem: It is challenging to tell whether the output is correct or incorrect without explicit test oracle (expected output). Consequently, developers should manually select and verify a few suspicious test cases to identify hard-to-detect defects. To this end, in this paper, we propose a novel approach, called <i>OneMoreTest</i>, to generating and selecting the most suspicious tests for manual verification. Based on a manually designed passed test, <i>OneMoreTest</i> automatically generates millions of input-output pairs for the method under test (MUT) with mutation-based fuzzing. It then trains an automatically generated neural network to simulate the MUT’s behavior. For new tests automatically generated for the same MUT, <i>OneMoreTest</i> suggests developers with the top <inline-formula><tex-math>$k$</tex-math></inline-formula> most suspicious tests that have the greatest distances between their actual output and estimated output (i.e., network’s output). Our evaluation on real-world faulty methods suggests that <i>OneMoreTest</i> is accurate. On 70.79% of the involved 178 real-world faulty methods, we can identify the defects by manually verifying only a SINGLE test for each of the methods according to <i>OneMoreTest</i>’s suggestions. Compared against the state of the art, <i>OneMoreTest</i> improved the precision from 46.63% to 72.62%, and recall from 46.63% to 70.79%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2346-2365"},"PeriodicalIF":5.6,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144488766","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enriching Mutation Testing With Innovative Method Invocation Mutation: Filling the Crucial Missing Piece of the Puzzle 用创新的方法调用突变丰富突变测试:填补拼图中缺失的关键部分
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-19 DOI: 10.1109/TSE.2025.3573751
Peng Zhang;Zeyu Lu;Yang Wang;Yibiao Yang;Yuming Zhou;Mike Papadakis
{"title":"Enriching Mutation Testing With Innovative Method Invocation Mutation: Filling the Crucial Missing Piece of the Puzzle","authors":"Peng Zhang;Zeyu Lu;Yang Wang;Yibiao Yang;Yuming Zhou;Mike Papadakis","doi":"10.1109/TSE.2025.3573751","DOIUrl":"10.1109/TSE.2025.3573751","url":null,"abstract":"Mutation testing aims to simulate real-world defects, but existing tools often struggle to replicate method invocation defects accurately. To address this, we propose MIN (Method INvocation mutator), which uses a mapping strategy to pair method names with corresponding values, ensuring that methods share argument and return types. This method enhances the feasibility and realism of mutants by considering factors such as library methods, access control, inheritance, and static methods. Experimental results show that integrating MIN into Major (a popular mutation tool) improves semantic similarity to real defects by 11%, increases mutant set diversity to 97.5%, and reduces undetected faults by 38.5%. Furthermore, MIN’s performance rivals that of state-of-the-art machine learning-based mutators like CodeBERT, with a 10x speed advantage over CodeBERT and 4x over DeepMutation in generating compilable mutants. These findings demonstrate that MIN can significantly enhance defect simulation and improve the efficiency of mutation testing.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2125-2143"},"PeriodicalIF":6.5,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144328538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Boosting Generalizable Fairness With Mahalanobis Distances Guided Boltzmann Exploratory Testing 用马氏距离引导玻尔兹曼探索性检验提高可推广公平性
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-19 DOI: 10.1109/TSE.2025.3581402
Kaixiang Dong;Peng Wu;Yanting Chen
{"title":"Boosting Generalizable Fairness With Mahalanobis Distances Guided Boltzmann Exploratory Testing","authors":"Kaixiang Dong;Peng Wu;Yanting Chen","doi":"10.1109/TSE.2025.3581402","DOIUrl":"10.1109/TSE.2025.3581402","url":null,"abstract":"Although machine learning models have been remarkably effective for decision-making tasks such as employment, insurance, and criminal justice, it remains urgent yet challenging to ensure model predictions are reliable and socially fair. This amounts to detecting and repairing potential discriminatory defects of machine learning models extensively with authentic testing data. In this paper, we propose a novel Mahalanobis distance guided Adaptive Exploratory Fairness Testing (MAEFT) approach, which searches for individual discriminatory instances (IDIs) through deep reinforcement learning with an adaptive extension of Boltzmann exploration, and significantly reduces overestimation. MAEFT uses Mahalanobis distances to guide the search with realistic correlations between input features. Thus, through learning a more accurate state-action value approximation, MAEFT can touch a much wider valid input space, reducing sharply the number of duplicate instances visited, and identify more unique tests and IDIs calibrated for the realistic feature correlations. Compared with state-of-the-art black-box and white-box fairness testing methods, our approach generates on average 4.65%-161.66% more unique tests and identifies 154.60%-634.80% more IDIs, with a performance speed-up of 12.54%-1313.47%. Moreover, the IDIs identified by MAEFT can be well exploited to repair the original models through retraining. These IDIs lead to, on average, a 59.15% boost in model fairness, 15.94%-48.73% higher than those identified by the state-of-the-art fairness testing methods. The models retrained with MAEFT also exhibit 37.66%-46.81% stronger generalization ability than those retrained with the state-of-the-art fairness testing methods.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2213-2231"},"PeriodicalIF":5.6,"publicationDate":"2025-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144328539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair RepairLLaMA:程序修复的有效表示和微调适配器
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-18 DOI: 10.1109/TSE.2025.3581062
André Silva;Sen Fang;Martin Monperrus
{"title":"RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair","authors":"André Silva;Sen Fang;Martin Monperrus","doi":"10.1109/TSE.2025.3581062","DOIUrl":"10.1109/TSE.2025.3581062","url":null,"abstract":"Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective ‘program repair adapter’ for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs, outperforming all baselines.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2366-2380"},"PeriodicalIF":5.6,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11039501","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144319900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Malo in the Code Jungle: Explainable Fault Localization for Decentralized Applications 代码丛林中的Malo:分布式应用程序的可解释故障定位
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-13 DOI: 10.1109/TSE.2025.3578816
Hui Zhang;Jiajing Wu;Zhiying Wu;Zhe Chen;Dan Lin;Jiachi Chen;Yuren Zhou;Zibin Zheng
{"title":"Malo in the Code Jungle: Explainable Fault Localization for Decentralized Applications","authors":"Hui Zhang;Jiajing Wu;Zhiying Wu;Zhe Chen;Dan Lin;Jiachi Chen;Yuren Zhou;Zibin Zheng","doi":"10.1109/TSE.2025.3578816","DOIUrl":"10.1109/TSE.2025.3578816","url":null,"abstract":"Decentralized applications (DApps) have long been sitting ducks for hackers due to their valuable cryptocurrency assets, exposing them to various security risks. When a DApp is attacked, promptly identifying faults is crucial to minimizing financial losses and ensuring effective fault repair. However, existing fault localization methods, which mostly rely on code coverage, often fall short for DApps, particularly when dealing with only one fault case. Furthermore, according to a prior survey, most developers expect fault localization tools to provide reasonable explanations. In this paper, we present Malo, a <underline>m</u>ethod for DApp-specific expl<underline>ai</u>nable fault <underline>lo</u>calization. It identifies fault functions through <italic>suspicious token transfer-guided analysis</i>, and then employs Large Language Models (LLMs) to generate explanations for these identified fault functions. Specifically, Malo examines function call traces and source codes of fault cases to acquire <italic>internal knowledge</i>, and also retrieves relevant project documents from the Web to obtain <italic>external knowledge</i>. By integrating internal and external knowledge, Malo generates reasonable explanations for faults in DApps. Our evaluation on a dataset of 68 real-world DApp faults demonstrates that Malo can locate 62% of faults within the Top-5, 9% higher than the state-of-the-art method. The experiment results also demonstrate a remarkable alignment accuracy of 71% between the explanations generated by Malo and the ground truth. In addition, we conduct a user study, which confirms that explanations generated by Malo can aid developers in comprehending the root cause of faults. Our code and dataset are available online: <uri>https://github.com/SodalimeZero/Malo_Code.git</uri>.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2197-2210"},"PeriodicalIF":6.5,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144288475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning BLAZE:通过动态分块和难示例学习实现跨语言和跨项目的Bug定位
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-12 DOI: 10.1109/TSE.2025.3579574
Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan
{"title":"BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning","authors":"Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan","doi":"10.1109/TSE.2025.3579574","DOIUrl":"10.1109/TSE.2025.3579574","url":null,"abstract":"Software bugs require developers to expend significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is crucial in reducing this effort. Existing bug localization tools, typically reliant on deep learning techniques, face limitations in both cross-project applicability and multi-language environments. Recent advancements with Large Language Models (LLMs) offer detailed representations for bug localization that may help to overcome such limitations. However, these models are known to encounter challenges with 1) limited context windows and 2) mapping accuracy. To address these challenges, we propose <monospace>BLAZE</monospace>, an approach that employs <i>dynamic chunking</i> and <i>hard example learning</i>. First, <monospace>BLAZE</monospace> dynamically segments source code to minimize continuity loss. Then, <monospace>BLAZE</monospace> fine-tunes a GPT-based model using complex bug reports in order to enhance cross-project and cross-language bug localization. To support the capability of <monospace>BLAZE</monospace>, we create the <monospace>BeetleBox</monospace> dataset, which comprises 23,782 bugs from 29 large and thriving open-source projects across five programming languages (Java, C<inline-formula><tex-math>$++$</tex-math></inline-formula>, Python, Go, and JavaScript). Our evaluation of <monospace>BLAZE</monospace> on three benchmark datasets—<monospace>BeetleBox</monospace>, SWE-Bench, and Ye et al.—demonstrates substantial improvements compared to six <i>state-of-the-art</i> baselines. Specifically, <monospace>BLAZE</monospace> achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR). Furthermore, an extensive ablation study confirms the contributions of our pipeline components to the overall performance enhancement.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2254-2267"},"PeriodicalIF":5.6,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MBL-CPDP: A Multi-Objective Bilevel Method for Cross-Project Defect Prediction MBL-CPDP:跨项目缺陷预测的多目标双层方法
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-10 DOI: 10.1109/TSE.2025.3577808
Jiaxin Chen;Jinliang Ding;Kay Chen Tan;Jiancheng Qian;Ke Li
{"title":"MBL-CPDP: A Multi-Objective Bilevel Method for Cross-Project Defect Prediction","authors":"Jiaxin Chen;Jinliang Ding;Kay Chen Tan;Jiancheng Qian;Ke Li","doi":"10.1109/TSE.2025.3577808","DOIUrl":"10.1109/TSE.2025.3577808","url":null,"abstract":"Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce. However, existing CPDP approaches suffer from three critical limitations: ineffective exploration of high-dimensional parameter spaces, poor adaptability across diverse projects with heterogeneous data distributions, and inadequate handling of feature redundancy and distribution discrepancies between source and target projects. To address these challenges, we formulate CPDP as a multi-objective bilevel optimization (MBLO) method, dubbed <monospace>MBL-CPDP</monospace>. Our approach comprises two nested problems: the upper-level, a multi-objective combinatorial optimization problem, enhances robustness by optimizing ML pipelines that integrate feature selection, transfer learning, and classification techniques, while the lower-level problem fine-tunes their hyperparameters. Unlike traditional methods that employ fragmented optimization strategies or single-objective approaches that introduce bias, <monospace>MBL-CPDP</monospace> provides a holistic, end-to-end optimization framework. Additionally, we propose an ensemble learning method to better capture cross-project distribution differences and improve generalization across diverse datasets. An MBLO algorithm is then presented to effectively solve the formulated MBLO problem. To evaluate <monospace>MBL-CPDP</monospace>’s performance, we compare it with five automated ML tools and 50 CPDP techniques across 20 projects. Extensive empirical results show that <monospace>MBL-CPDP</monospace> outperforms the comparison methods, demonstrating its superior adaptability and comprehensive performance evaluation capability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2305-2328"},"PeriodicalIF":5.6,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信