IEEE Transactions on Software Engineering最新文献_第3页

Proactive Debugging of Memory Leakage Bugs in Single Page Web Applications 主动调试单页Web应用程序中的内存泄漏错误

IF 7.4 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-05-16 DOI: 10.1109/tse.2025.3571192

Arooba Shahoor, Satbek Abdyldayev, Hyeongi Hong, Jooyong Yi, Dongsun Kim

引用次数: 0

TARGET: Traffic Rule-based Test Generation for Autonomous Driving Systems 目标：基于交通规则的自动驾驶系统测试生成

IF 7.4 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-05-09 DOI: 10.1109/tse.2025.3569086

Yao Deng, Zhi Tu, Jiaohong Yao, Mengshi Zhang, Tianyi Zhang, Xi Zheng

引用次数: 0

OPTSE: Towards Optimal Symbolic Execution OPTSE：走向最佳符号执行

IF 7.4 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-05-02 DOI: 10.1109/tse.2025.3564666

Shunkai Zhu, Jun Sun, Jingyi Wang, Xingwei Lin, Peng Cheng

引用次数: 0

Question Selection for Multimodal Code Search Synthesis Using Probabilistic Version Spaces 基于概率版本空间的多模态代码搜索综合问题选择

IF 6.5 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-04-29 DOI: 10.1109/TSE.2025.3565387

Jiarong Wu;Yanyan Jiang;Lili Wei;Congying Xu;Shing-Chi Cheung;Chang Xu

{"title":"Question Selection for Multimodal Code Search Synthesis Using Probabilistic Version Spaces","authors":"Jiarong Wu;Yanyan Jiang;Lili Wei;Congying Xu;Shing-Chi Cheung;Chang Xu","doi":"10.1109/TSE.2025.3565387","DOIUrl":"10.1109/TSE.2025.3565387","url":null,"abstract":"Searching the occurrences of specific code patterns (code search) is a common task in software engineering, and programming by example (PBE) techniques have been applied to ease customizing code patterns. However, previous PBE tools only synthesize programs meeting the input-output examples, which may not always align with the user intent. To bridge this gap, this paper proposes <sc>Excalibur, a multi-modal (example and natural language description) and interactive synthesizer for code search. <sc>Excalibur ensures that the generated programs are correct for the provided examples (soundness) and include the user-intended program (bounded completeness). Furthermore, <sc>Excalibur helps the user identify the user-intended program through question-answer interaction. To minimize the required interaction efforts, question selection is crucial. To improve question selection for code search, we propose probabilistic version spaces (ProbVS), in which the user-intended program’s probability is high and others are low. ProbVS combines traditional version spaces for compactly representing extensive programs and large language models (on the user-provided natural language description) for adjusting programs’ probabilities to align with users’ intents. Extensive experiments on a benchmark of 44 tasks demonstrated the effectiveness of <sc>Excalibur and ProbVS and demystified how ProbVS affects probability distributions and how the configurable parameters affect ProbVS.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1724-1744"},"PeriodicalIF":6.5,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeepVec: State-Vector Aware Test Case Selection for Enhancing Recurrent Neural Network 深度vec：状态向量感知测试用例选择增强递归神经网络

IF 6.5 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-04-28 DOI: 10.1109/TSE.2025.3565037

Zhonghao Jiang;Meng Yan;Li Huang;Weifeng Sun;Chao Liu;Song Sun;David Lo

{"title":"DeepVec: State-Vector Aware Test Case Selection for Enhancing Recurrent Neural Network","authors":"Zhonghao Jiang;Meng Yan;Li Huang;Weifeng Sun;Chao Liu;Song Sun;David Lo","doi":"10.1109/TSE.2025.3565037","DOIUrl":"10.1109/TSE.2025.3565037","url":null,"abstract":"Deep Neural Networks (DNN) have realized significant achievements across various application domains. There is no doubt that testing and enhancing a pre-trained DNN that has been deployed in an application scenario is crucial, because it can reduce the failures of the DNN. DNN-driven software testing and enhancement require large amounts of labeled data. The high cost and inefficiency caused by the large volume of data of manual labeling, and the time consumption of testing all cases in real scenarios are unacceptable. Therefore, test case selection technologies are proposed to reduce the time cost by selecting and only labeling representative test cases without compromising testing performance. Test case selection based on neuron coverage (NC) or uncertainty metrics has achieved significant success in Convolutional Neural Networks (CNN) testing. However, it is challenging to transfer these methods to Recurrent Neural Networks (RNN), which excel at text tasks, due to the mismatch in model output formats and the reliance on image-specific characteristics. What’s more, balancing the execution cost and performance of the algorithm is also indispensable. In this paper, we propose a state-vector aware test case selection method for RNN models, namely DeepVec, which reduces the cost of data labeling and saves computing resources and balances the execution cost and performance. DeepVec selects data using uncertainty metric based on the norm of the output vector at each time step (i.e., state-vector), and similarity metric based on the direction angle of the state-vector. Because test cases with smaller state-vector norms often possess greater information entropy and similar changes of state-vector direction angle indicate similar RNN internal states. These metrics can be calculated with just a single inference, which gives it strong bug detection and model improvement capabilities. We evaluate DeepVec on five popular datasets, containing images and texts as well as commonly used 3 RNN classification models, and compare it with NC-based, uncertainty-based, and other black-box methods. Experimental results demonstrate that DeepVec achieves an average relative improvement of 12.5%-118.22% over baseline methods in selecting fault-revealing test cases with time costs reduced to only 1% to 1‱. At the same time, we find that the absolute accuracy improvement after retraining outperforms baseline methods by 0.29%-24.01% when selecting 15% data to retrain.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1702-1723"},"PeriodicalIF":6.5,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

LLMorpheus: Mutation Testing Using Large Language Models LLMorpheus：使用大型语言模型的突变测试

IF 6.5 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-04-25 DOI: 10.1109/TSE.2025.3562025

Frank Tip;Jonathan Bell;Max Schäfer

{"title":"LLMorpheus: Mutation Testing Using Large Language Models","authors":"Frank Tip;Jonathan Bell;Max Schäfer","doi":"10.1109/TSE.2025.3562025","DOIUrl":"10.1109/TSE.2025.3562025","url":null,"abstract":"In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program’s tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a “+” with a “-”, or removing a function’s body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique for mutation testing where placeholders are introduced at designated locations in a program’s source code and where a Large Language Model (LLM) is prompted to ask what they could be replaced with. The technique is implemented in <italic>LLMorpheus, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find <italic>LLMorpheus to be capable of producing mutants that resemble existing bugs that cannot be produced by <italic>StrykerJS, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by <italic>LLMorpheus, demonstrating its practicality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1645-1665"},"PeriodicalIF":6.5,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

On the Applicability of Code Language Models to Scientific Computing Programs 论代码语言模型在科学计算程序中的适用性

IF 6.5 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-04-25 DOI: 10.1109/TSE.2025.3564599

Qianhui Zhao;Fang Liu;Xiao Long;Chengru Wu;Li Zhang

{"title":"On the Applicability of Code Language Models to Scientific Computing Programs","authors":"Qianhui Zhao;Fang Liu;Xiao Long;Chengru Wu;Li Zhang","doi":"10.1109/TSE.2025.3564599","DOIUrl":"10.1109/TSE.2025.3564599","url":null,"abstract":"Scientific Computing Programming Languages (SCPLs), like MATLAB and R, are popular and widely used for computational mathematics. In recent years, pre-trained code language models (CLMs) have automated many code-related tasks, covering various general programming languages. SCPLs share many similarities with general programming languages, including similar syntactic structures and the semantics of identifiers. Despite the similarities, there exist many differences between them. For example, lots of numerical operations and dedicated libraries exist in SCPLs. However, there has been little comprehensive work analyzing CLMs’ capabilities in the understanding and generation of pragmatic scientific computing programs. To this end, we investigate the applicability of code language models for the SCPL analysis, especially focusing on real-world code in open-source repositories. We first create a benchmark that contains programs and documentation from three widely used scientific computing programming languages, then perform an adequate evaluation of existing advanced code language models on both code understanding and generation tasks using the new benchmark, and study the relations of different training strategies, model types, and model sizes to the performance of different tasks and languages. Evaluation results confirm that, compared to general programming languages, SCPLs are more challenging to understand, and especially to generate, but the use of code language models is nevertheless feasible, and the knowledge obtained from the general languages can be transferred to SCPL analysis. A deeper analysis reveals additional challenges in generating code that incorporates API calls relevant to computational mathematics. We believe that our findings can provide guidance on improving tooling and analyses for the scientific programming languages, and also inspire and motivate researchers to improve the robustness of existing code language models.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1685-1701"},"PeriodicalIF":6.5,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Testing CPS With Design Assumptions-Based Metamorphic Relations and Genetic Programming 基于设计假设的变质关系和遗传规划检验CPS

IF 6.5 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-04-24 DOI: 10.1109/TSE.2025.3563121

Claudio Mandrioli;Seung Yeob Shin;Domenico Bianculli;Lionel Briand

{"title":"Testing CPS With Design Assumptions-Based Metamorphic Relations and Genetic Programming","authors":"Claudio Mandrioli;Seung Yeob Shin;Domenico Bianculli;Lionel Briand","doi":"10.1109/TSE.2025.3563121","DOIUrl":"10.1109/TSE.2025.3563121","url":null,"abstract":"Cyber-Physical Systems (CPSs) software is used to enforce desired behaviours on physical systems. To test the interaction between the CPS software and the system’s physics, engineers provide traces of desired physical states and observe traces of the actual physical states. CPS requirements describe how closely the actual physical traces should track the desired traces. These requirements are typically defined for specific, simple input traces such as step or ramp sequences, and thus are not applicable to arbitrary inputs. This limits the availability of oracles for CPSs. Our recent work proposes an approach to testing CPSs using control-theoretical design assumptions instead of requirements. This approach circumvents the oracle problem by leveraging the control-theoretical guarantees that are provided when the design assumptions are satisfied. To address the test case generation and oracle problems, researchers have proposed metamorphic testing, which is based on the study of relations across tests, i.e., metamorphic relations (MRs). In this work, we define MRs based on the design assumptions and explore combinations of these MRs using genetic programming to generate CPS test cases. This enables the generation of CPS input traces with potentially arbitrary shapes, together with associated expected output traces. We use the deviation from the expected output traces to guide the generation of input traces that falsify the MRs. Our experiment results show that the MR-falsification provides engineers with new information, helping them identify passed and failed test cases. Furthermore, we show that the generation of traces that falsify the MRs is a non-trivial problem, which cannot be addressed with a random generation approach but is successfully addressed by our approach based on genetic search.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1666-1684"},"PeriodicalIF":6.5,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10976605","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143873070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Systematic Study on Real-World Android App Bundles Android应用捆绑系统研究

IF 6.5 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-04-11 DOI: 10.1109/TSE.2025.3560026

Yutian Tang;Xiapu Luo;Yuming Zhou

引用次数: 0

Retrieval-Augmented Fine-Tuning for Improving Retrieve-and-Edit Based Assertion Generation 用于改进基于检索和编辑的断言生成的检索增强微调

IF 6.5 1区计算机科学

IEEE Transactions on Software Engineering Pub Date : 2025-04-07 DOI: 10.1109/TSE.2025.3558403

Hongyan Li;Weifeng Sun;Meng Yan;Ling Xu;Qiang Li;Xiaohong Zhang;Hongyu Zhang

{"title":"Retrieval-Augmented Fine-Tuning for Improving Retrieve-and-Edit Based Assertion Generation","authors":"Hongyan Li;Weifeng Sun;Meng Yan;Ling Xu;Qiang Li;Xiaohong Zhang;Hongyu Zhang","doi":"10.1109/TSE.2025.3558403","DOIUrl":"10.1109/TSE.2025.3558403","url":null,"abstract":"Unit Testing is crucial in software development and maintenance, aiming to verify that the implemented functionality is consistent with the expected functionality. A unit test is composed of two parts: a test prefix, which drives the unit under test to a particular state, and a test assertion, which determines what the expected behavior is under that state. To reduce the effort of conducting unit tests manually, Yu et al. proposed an integrated approach (integration for short), combining information retrieval with a deep learning-based approach to generate assertions for test prefixes, and obtained promising results. In our previous work, we found that the overall performance of integration is mainly due to its success in retrieving assertions. Moreover, integration is limited to specific types of edit operations and struggles to understand the semantic differences between the retrieved focal-test (focal-test includes a test prefix and a unit under test) and the input focal-test. Based on these insights, we then proposed a retrieve-and-edit approach named EditAS to learn the assertion edit patterns to improve the effectiveness of assertion generation in our prior study. Despite being promising, we find that the effectiveness of EditAS can be further improved. Our analysis shows that: ① The editing ability of EditAS still has ample room for improvement. Its performance degrades as the edit distance between the retrieval assertion and ground truth increases. Specifically, the average accuracy of EditAS is <inline-formula><tex-math>$12.38%$</tex-math></inline-formula> when the edit distance is greater than 5. ② EditAS lacks a fine-grained semantic understanding of both the retrieved focal-test and the input focal-test themselves, which leads to many inaccurate token modifications. In particular, an average of 25.57% of the incorrectly generated assertions that need to be modified are not modified, and an average of 6.45% of the assertions that match the ground truth are still modified. Thanks to pre-trained models employing pre-training paradigms on large-scale data, they tend to have good semantic comprehension and code generation abilities. In light of this, we propose <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula>, which improves retrieval-and-edit based assertion generation through retrieval-augmented fine-tuning. Specifically, <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula> first retrieves a similar focal-test from a predefined corpus and treats its assertion as a prototype. Then, <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula> uses a pre-trained model, CodeT5, to learn the semantics of the input and similar focal-tests as well as assertion editing patterns to automatically edit the prototype. We first evaluate the <inline-formula><tex-math>$EditAS^{2}$</tex-math></inline-formula> for i","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 5","pages":"1591-1614"},"PeriodicalIF":6.5,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143797837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0