IEEE Transactions on Software Engineering最新文献

筛选
英文 中文
Multimodal Fusion for Android Malware Detection Based on Large Pre-Trained Models 基于大型预训练模型的Android恶意软件检测多模态融合
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-03 DOI: 10.1109/TSE.2025.3557577
Xun Li;Lei Liu;Yuzhou Liu;Yu Zhao;Peng Zhang;Huaxiao Liu
{"title":"Multimodal Fusion for Android Malware Detection Based on Large Pre-Trained Models","authors":"Xun Li;Lei Liu;Yuzhou Liu;Yu Zhao;Peng Zhang;Huaxiao Liu","doi":"10.1109/TSE.2025.3557577","DOIUrl":"10.1109/TSE.2025.3557577","url":null,"abstract":"Malware detection is a critical issue in software engineering as it directly threatens user information security. Existing approaches often focus on individual modality (either source code or binary code) for the detection, but it ignores to effectively exploit the complementary information between them. This limits the detection performance, especially in complex and evasive malware scenarios. In this paper, we take Android applications written in Java as objects, and provide a novel fine-grained multimodal fusion method with large pre-trained models to combine the features from source and binary codes for the malware detection. For the source code modality, we employ the graphical user interface (GUI) as a framework to segment the source code into snippets, and use a pre-trained programming language model to extract feature representations. For the binary code modality, we convert binary code into grayscale images and fine-tune a pre-trained vision model to extract features indirectly. We then implement cross-modal attention and devise a contrastive loss to align features across modalities, supplementing this with supervised classification loss to refine the multimodal fusion process specifically for malware detection. Our experiments, conducted using the Data-MD and Data-MC benchmarks, demonstrate that our approach achieves a precision of 0.977 and a recall of 0.984 in detecting malware. This underscores the advantages of using large pre-trained models for feature representation and the fusion of information across different modalities for effective malware detection.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1569-1590"},"PeriodicalIF":6.5,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143775484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution NumScout:使用llm -剪枝符号执行揭示智能合约中的数字缺陷
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-28 DOI: 10.1109/TSE.2025.3555622
Jiachi Chen;Zhenzhe Shao;Shuo Yang;Yiming Shen;Yanlin Wang;Ting Chen;Zhenyu Shan;Zibin Zheng
{"title":"NumScout: Unveiling Numerical Defects in Smart Contracts Using LLM-Pruning Symbolic Execution","authors":"Jiachi Chen;Zhenzhe Shao;Shuo Yang;Yiming Shen;Yanlin Wang;Ting Chen;Zhenyu Shan;Zibin Zheng","doi":"10.1109/TSE.2025.3555622","DOIUrl":"10.1109/TSE.2025.3555622","url":null,"abstract":"In recent years, the Ethereum platform has witnessed a proliferation of smart contracts, accompanied by exponential growth in total value locked (TVL). High-TVL smart contracts often require complex numerical computations, particularly in mathematical financial models used by many decentralized applications (DApps). Improper calculations can introduce numerical defects, posing potential security risks. Existing research primarily focuses on traditional numerical defects like integer overflow, and there is currently a lack of systematic research and effective detection methods targeting new types of numerical defects. In this paper, we identify five new types of numerical defects through the analysis of 1,199 audit reports by utilizing the open card method. Each defect is defined and illustrated with a code example to highlight its features and potential consequences. We also propose NumScout, a symbolic execution-based tool designed to detect these five defects. Specifically, the tool combines information from source code and bytecode, analyzing key operations such as comparisons and transfers, to effectively locate defects and report them based on predefined detection patterns. Furthermore, NumScout uses a large language model (LLM) to prune functions which are unrelated to numerical operations. This step allows symbolic execution to quickly enter the target function and improve runtime speed by 28.4%. We run NumScout on 6,617 real-world contracts and evaluated its performance based on manually labeled results. We find that 1,774 contracts contained at least one of the five defects, and the tool achieved an overall precision of 89.7%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1538-1553"},"PeriodicalIF":6.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734111","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prioritizing Test Gaps by Risk in Industrial Practice: An Automated Approach and Multimethod Study 工业实践中根据风险对测试间隙进行优先排序:自动化方法和多方法研究
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-28 DOI: 10.1109/TSE.2025.3556248
Roman Haas;Michael Sailer;Mitchell Joblin;Elmar Juergens;Sven Apel
{"title":"Prioritizing Test Gaps by Risk in Industrial Practice: An Automated Approach and Multimethod Study","authors":"Roman Haas;Michael Sailer;Mitchell Joblin;Elmar Juergens;Sven Apel","doi":"10.1109/TSE.2025.3556248","DOIUrl":"10.1109/TSE.2025.3556248","url":null,"abstract":"<italic>Context.</i> Untested code changes, called <italic>test gaps</i>, pose a significant risk for software projects. Since test gaps increase the probability of defects, managing test gaps and their individual risk is important, especially for rapidly changing software systems. <italic>Objective.</i> This study aims at gaining an understanding of test gaps in industrial practice establishing criteria for precise prioritization of test gaps by their risk, informing practitioners that need to manage, review, and act on larger sets of test gaps. <italic>Method.</i> We propose an automated approach for prioritizing test gaps based on key risk criteria. By means of an analysis of 31 historical test gap reviews from 8 industrial software systems of our industrial partners Munich Re and LV 1871, and by conducting semi-structured interviews with the 6 quality engineers that authored the historical test gap reviews, we validate the transferability of the identified risk criteria, such as code criticality and complexity metrics. <italic>Results.</i> Our automated approach exhibits a ranking performance equivalent to expert assessments, in that test gaps labelled as risky in historical test gap reviews are prioritized correctly, on average, on the 30th percentile. In some scenarios, our automated ranking system even outpaces expert assessments, especially for test gaps in central code—for non-developers an opaque code property. <italic>Conclusion.</i> This research underscores the industrial need of test gap risk estimation techniques to assist test management and quality assurance teams in identifying and addressing critical test gaps. Our multimethod study shows that even a lightweight prioritization approach helps practitioners to identify high-risk test gaps efficiently and to filter out low-risk test gaps.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1554-1568"},"PeriodicalIF":6.5,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BabelRTS: Polyglot Regression Test Selection BabelRTS:多语言回归测试选择
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-27 DOI: 10.1109/TSE.2025.3554403
Gabriele Maurina;Walter Cazzola;Sudipto Ghosh
{"title":"BabelRTS: Polyglot Regression Test Selection","authors":"Gabriele Maurina;Walter Cazzola;Sudipto Ghosh","doi":"10.1109/TSE.2025.3554403","DOIUrl":"10.1109/TSE.2025.3554403","url":null,"abstract":"Regression test selection (RTS) approaches reduce the number of regression tests. Current RTS approaches are typically monoglot, i.e., their implementations target a specific language. However, many subjects under test (SUT) are polyglot, i.e., they use multiple languages. Running multiple monoglot RTS approaches separately on a polyglot SUT is unsafe because tests that involve inter-language dependencies can be missed. Moreover, a new language may require completely reimplementing an RTS approach, especially if the original implementation relies on language and runtime features that are not available in the new language. We propose a new static approach called BabelRTS, which is multilingual (supports multiple languages out of the box), polyglot (analyzes SUTs written in multiple languages), and extensible (allows adding support for new languages). A key contribution is the idea of encapsulating the language-specific aspects of RTS by using patterns and actions. A pattern specifies programming language constructs used in each file that indicate dependencies to other files written in the same or a different language. An action specifies how to identify these files in the codebase. Patterns and actions can be customized to support new languages without modifying the test selection algorithm. BabelRTS is not tied to a specific language run-time system or paradigm. BabelRTS currently supports 12 languages and 5 language combinations. We evaluated BabelRTS on 142 open-source monoglot and polyglot SUTs, analyzing a total of more than two billion LOC. The performance of BabelRTS was similar to the state-of-the-art monoglot approaches on monoglot SUTs. On polyglot SUTs, BabelRTS was safer in polyglot mode and selected more tests for 60% of the commits than in monoglot mode, which missed inter-language dependencies.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1487-1499"},"PeriodicalIF":6.5,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143734084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Programmer Visual Attention During Context-Aware Code Summarization 程序员在上下文感知代码总结过程中的视觉注意力
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-26 DOI: 10.1109/TSE.2025.3554990
Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan
{"title":"Programmer Visual Attention During Context-Aware Code Summarization","authors":"Robert Wallace;Aakash Bansal;Zachary Karas;Ningzhi Tang;Yu Huang;Toby Jia-Jun Li;Collin McMillan","doi":"10.1109/TSE.2025.3554990","DOIUrl":"10.1109/TSE.2025.3554990","url":null,"abstract":"Programmer attention represents the visual focus of programmers on parts of the source code in pursuit of programming tasks. The focus of current research in modeling this programmer attention has been on using mouse cursors, keystrokes, or eye tracking equipment to map areas in a snippet of code. These approaches have traditionally only mapped attention for a single method. However, there is a knowledge gap in the literature because programming tasks such as source code summarization require programmers to use contextual knowledge that can only be found in other parts of the project, not only in a single method. To address this knowledge gap, we conducted an in-depth human study with 10 Java programmers, where each programmer generated summaries for 40 methods from five large Java projects over five one-hour sessions. We used eye tracking equipment to map the visual attention of programmers while they wrote the summaries. We also rate the quality of each summary. We found eye-gaze patterns and metrics that define common behaviors between programmer attention during context-aware code summarization. Specifically, we found that programmers need to read up to 35% fewer words (p <inline-formula><tex-math>$boldsymbol{ lt }$</tex-math></inline-formula> 0.01) over the whole session, and revisit 13% fewer words (p <inline-formula><tex-math>$ lt $</tex-math></inline-formula> 0.03) as they summarize each method during a session, while maintaining the quality of summaries. We also found that the amount of source code a participant looks at correlates with a higher quality summary, but this trend follows a bell-shaped curve, such that after a threshold reading more source code leads to a significant decrease (p <inline-formula><tex-math>$boldsymbol{ lt }$</tex-math></inline-formula> 0.01) in the quality of summaries. We also gathered insight into the type of methods in the project that provide the most contextual information for code summarization based on programmer attention. Specifically, we observed that programmers spent a majority of their time looking at methods inside the same class as the target method to be summarized. Surprisingly, we found that programmers spent significantly less time looking at methods in the call graph of the target method. We discuss how our empirical observations may aid future studies towards modeling programmer attention and improving context-aware automatic source code summarization.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1524-1537"},"PeriodicalIF":6.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLM-Based Automation of COSMIC Functional Size Measurement From Use Cases 基于llm的基于用例的COSMIC功能大小测量自动化
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-26 DOI: 10.1109/TSE.2025.3554562
Gabriele De Vito;Sergio Di Martino;Filomena Ferrucci;Carmine Gravino;Fabio Palomba
{"title":"LLM-Based Automation of COSMIC Functional Size Measurement From Use Cases","authors":"Gabriele De Vito;Sergio Di Martino;Filomena Ferrucci;Carmine Gravino;Fabio Palomba","doi":"10.1109/TSE.2025.3554562","DOIUrl":"10.1109/TSE.2025.3554562","url":null,"abstract":"COmmon Software Measurement International Consortium (COSMIC) Functional Size Measurement is a method widely used in the software industry to quantify user functionality and measure software size, which is crucial for estimating development effort, cost, and resource allocation. COSMIC measurement is a manual task that requires qualified professionals and effort. To support professionals in COSMIC measurement, we propose an automatic approach, CosMet, that leverages Large Language Models to measure software size starting from use cases specified in natural language. To evaluate the proposed approach, we developed a web tool that implements CosMet using GPT-4 and conducted two studies to assess the approach quantitatively and qualitatively. Initially, we experimented with CosMet on seven software systems, encompassing 123 use cases, and compared the generated results with the ground truth created by two certified professionals. Then, seven professional measurers evaluated the analysis achieved by CosMet and the extent to which the approach reduces the measurement time. The first study's results revealed that CosMet is highly effective in analyzing and measuring use cases. The second study highlighted that CosMet offers a transparent and interpretable analysis, allowing practitioners to understand how the measurement is derived and make necessary adjustments. Additionally, it reduces the manual measurement time by 60-80%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1500-1523"},"PeriodicalIF":6.5,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143712755","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
RECOVER: Toward Requirements Generation From Stakeholders’ Conversations 恢复:从涉众的对话中生成需求
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-21 DOI: 10.1109/TSE.2025.3572056
Gianmario Voria;Francesco Casillo;Carmine Gravino;Gemma Catolino;Fabio Palomba
{"title":"RECOVER: Toward Requirements Generation From Stakeholders’ Conversations","authors":"Gianmario Voria;Francesco Casillo;Carmine Gravino;Gemma Catolino;Fabio Palomba","doi":"10.1109/TSE.2025.3572056","DOIUrl":"10.1109/TSE.2025.3572056","url":null,"abstract":"Stakeholders’ conversations requirements elicitation meetings hold valuable insights into system and client needs. However, manually extracting requirements is time-consuming, labor-intensive, and prone to errors and biases. While current state-of-the-art methods assist in summarizing stakeholder conversations and classifying requirements based on their nature, there is a noticeable lack of approaches capable of both identifying requirements within these conversations and generating corresponding system requirements. These approaches would assist requirement identification, reducing engineers’ workload, time, and effort. They would also enhance accuracy and consistency in documentation, providing a reliable foundation for further analysis. To address this gap, this paper introduces <sc>RECOVER</small> (Requirements EliCitation frOm conVERsations), a novel conversational requirements engineering approach that leverages natural language processing and large language models (LLMs) to support practitioners in automatically extracting system requirements from stakeholder interactions by analyzing individual conversation turns. The approach is evaluated using a mixed-method research design that combines statistical performance analysis with a user study involving requirements engineers, targeting two levels of granularity. First, at the conversation turn level, the evaluation measures <sc>RECOVER</small>’s accuracy in identifying requirements-relevant dialogue and the quality of generated requirements in terms of correctness, completeness, and actionability. Second, at the entire conversation level, the evaluation assesses the overall usefulness and effectiveness of <sc>RECOVER</small> in synthesizing comprehensive system requirements from full stakeholder discussions. Empirical evaluation of <sc>RECOVER</small> shows promising performance, with generated requirements demonstrating satisfactory correctness, completeness, and actionability. The results also highlight the potential of automating requirements elicitation from conversations as an aid that enhances efficiency while maintaining human oversight.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1912-1933"},"PeriodicalIF":6.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144113868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Do Experts Agree About Smelly Infrastructure? 专家们对臭气熏天的基础设施意见一致吗?
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-21 DOI: 10.1109/TSE.2025.3553383
Sogol Masoumzadeh;Nuno Saavedra;Rungroj Maipradit;Lili Wei;João F. Ferreira;Dániel Varró;Shane McIntosh
{"title":"Do Experts Agree About Smelly Infrastructure?","authors":"Sogol Masoumzadeh;Nuno Saavedra;Rungroj Maipradit;Lili Wei;João F. Ferreira;Dániel Varró;Shane McIntosh","doi":"10.1109/TSE.2025.3553383","DOIUrl":"10.1109/TSE.2025.3553383","url":null,"abstract":"Code smells are anti-patterns that violate code understandability, re-usability, changeability, and maintainability. It is important to identify code smells and locate them in the code. For this purpose, automated detection of code smells is a sought-after feature for development tools; however, the design and evaluation of such tools depends on the quality of oracle datasets. The typical approach for creating an oracle dataset involves multiple developers independently inspecting and annotating code examples for their existing code smells. Since multiple inspectors cast votes about each code example, it is possible for the inspectors to disagree about the presence of smells. Such disagreements introduce ambiguity into how smells should be interpreted. Prior work has studied developer perceptions of code smells in traditional source code; however, smells in Infrastructure-as-Code (IaC) have not been investigated. To understand the real-world impact of disagreements among developers and their perceptions of IaC code smells, we conduct an empirical study on the oracle dataset of GLITCH—a state-of-the-art detection tool for security code smells in IaC. We analyze GLITCH's oracle dataset for code smell issues, their types, and individual annotations of the inspectors. Furthermore, we investigate possible confounding factors associated with the incidences of developer misaligned perceptions of IaC code smells. Finally, we triangulate developer perceptions of code smells in traditional source code with our results on IaC. Our study reveals that unlike developer perceptions of smells in traditional source code, their perceptions of smells in IaC are more substantially impacted by subjective interpretation of smell types and their co-occurrence relationships. For instance, the interpretation of admins by default, empty passwords, and hard-coded secrets varies considerably among raters and are more susceptible to misidentification than other IaC code smells. Consequently, the manual identification of IaC code smells involves annotation disagreements among developers—46.3% of studied IaC code smell incidences have at least one dissenting vote among three inspectors. Meanwhile, only 1.6% of code smell incidences in traditional source code are affected by inspector bias stemming from these disagreements. Hence, relying solely on the majority voting, would not fully represent the breadth of interpretation of the IaC under scrutiny.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1472-1486"},"PeriodicalIF":6.5,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143672295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Understanding and Identifying Technical Debt in the Co-Evolution of Production and Test Code 理解和识别生产和测试代码共同演进中的技术债务
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-19 DOI: 10.1109/TSE.2025.3553112
Yimeng Guo;Zhifei Chen;Lu Xiao;Lin Chen;Yanhui Li;Yuming Zhou
{"title":"Understanding and Identifying Technical Debt in the Co-Evolution of Production and Test Code","authors":"Yimeng Guo;Zhifei Chen;Lu Xiao;Lin Chen;Yanhui Li;Yuming Zhou","doi":"10.1109/TSE.2025.3553112","DOIUrl":"10.1109/TSE.2025.3553112","url":null,"abstract":"The co-evolution of production and test code (PT co-evolution) has received increasing attention in recent years. However, we found that existing work did not comprehensively study various PT co-evolution scenarios, such as the qualification and persistence of their effects on software. Inspired by technical debt (TD), we refer to TD generated during the co-evolution between production and test code as PT co-evolution technical debt (PTCoTD). To better understand PT co-evolution, we first conducted an exploratory study on its characteristics on 15 open-source projects, finding unbalanced PT co-evolution prevalent and summarizing five potential PT flaws. Then we proposed an approach to identify and quantify PTCoTDs of these flaw patterns, considering evolutionary and structural relationships. We also built prediction models to describe cost trajectories and rank all PTCoTDs to prioritize expensive ones. The evaluation on the 15 projects shows that our approach can identify PTCoTDs that deserve attention. The identified PTCoTDs account for about half of the project's total maintenance costs, and the cost proportion of the expensive Top-5 is 1.8x more than the file proportion they contain. Almost all covered maintenance costs persist as PTCoTD in the future, with an average increase of 6.8% between the last two releases. Our approach also accurately predicts the costs of PTCoTD with an average prediction deviation of only 8.3%. Our study provides valuable insights into PT co-evolution scenarios and their effects, which can guide practices and inspire future work on software testing and maintenance.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1415-1436"},"PeriodicalIF":6.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FlexFL: Flexible and Effective Fault Localization With Open-Source Large Language Models FlexFL:基于开源大型语言模型的灵活有效的故障定位
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-03-19 DOI: 10.1109/TSE.2025.3553363
Chuyang Xu;Zhongxin Liu;Xiaoxue Ren;Gehao Zhang;Ming Liang;David Lo
{"title":"FlexFL: Flexible and Effective Fault Localization With Open-Source Large Language Models","authors":"Chuyang Xu;Zhongxin Liu;Xiaoxue Ren;Gehao Zhang;Ming Liang;David Lo","doi":"10.1109/TSE.2025.3553363","DOIUrl":"10.1109/TSE.2025.3553363","url":null,"abstract":"Fault localization (FL) targets identifying bug locations within a software system, which can enhance debugging efficiency and improve software quality. Due to the impressive code comprehension ability of Large Language Models (LLMs), a few studies have proposed to leverage LLMs to locate bugs, i.e., LLM-based FL, and demonstrated promising performance. However, first, these methods are limited in flexibility. They rely on bug-triggering test cases to perform FL and cannot make use of other available bug-related information, e.g., bug reports. Second, they are built upon proprietary LLMs, which are, although powerful, confronted with risks in data privacy. To address these limitations, we propose a novel LLM-based FL framework named FlexFL, which can flexibly leverage different types of bug-related information and effectively work with open-source LLMs. FlexFL is composed of two stages. In the first stage, FlexFL reduces the search space of buggy code using state-of-the-art FL techniques of different families and provides a candidate list of bug-related methods. In the second stage, FlexFL leverages LLMs to delve deeper to double-check the code snippets of methods suggested by the first stage and refine fault localization results. In each stage, FlexFL constructs agents based on open-source LLMs, which share the same pipeline that does not postulate any type of bug-related information and can interact with function calls without the out-of-the-box capability. Extensive experimental results on Defects4J demonstrate that FlexFL outperforms the baselines and can work with different open-source LLMs. Specifically, FlexFL with a lightweight open-source LLM Llama3-8B can locate 42 and 63 more bugs than two state-of-the-art LLM-based FL approaches AutoFL and AgentFL that both use GPT-3.5. In addition, FlexFL can localize 93 bugs that cannot be localized by non-LLM-based FL techniques at the top 1. Furthermore, to mitigate potential data contamination, we conduct experiments on a dataset which Llama3-8B has not seen before, and the evaluation results show that FlexFL can also achieve good performance.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1455-1471"},"PeriodicalIF":6.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143661523","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信