IEEE Transactions on Software Engineering最新文献

筛选
英文 中文
Malo in the Code Jungle: Explainable Fault Localization for Decentralized Applications 代码丛林中的Malo:分布式应用程序的可解释故障定位
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-13 DOI: 10.1109/TSE.2025.3578816
Hui Zhang;Jiajing Wu;Zhiying Wu;Zhe Chen;Dan Lin;Jiachi Chen;Yuren Zhou;Zibin Zheng
{"title":"Malo in the Code Jungle: Explainable Fault Localization for Decentralized Applications","authors":"Hui Zhang;Jiajing Wu;Zhiying Wu;Zhe Chen;Dan Lin;Jiachi Chen;Yuren Zhou;Zibin Zheng","doi":"10.1109/TSE.2025.3578816","DOIUrl":"10.1109/TSE.2025.3578816","url":null,"abstract":"Decentralized applications (DApps) have long been sitting ducks for hackers due to their valuable cryptocurrency assets, exposing them to various security risks. When a DApp is attacked, promptly identifying faults is crucial to minimizing financial losses and ensuring effective fault repair. However, existing fault localization methods, which mostly rely on code coverage, often fall short for DApps, particularly when dealing with only one fault case. Furthermore, according to a prior survey, most developers expect fault localization tools to provide reasonable explanations. In this paper, we present Malo, a <underline>m</u>ethod for DApp-specific expl<underline>ai</u>nable fault <underline>lo</u>calization. It identifies fault functions through <italic>suspicious token transfer-guided analysis</i>, and then employs Large Language Models (LLMs) to generate explanations for these identified fault functions. Specifically, Malo examines function call traces and source codes of fault cases to acquire <italic>internal knowledge</i>, and also retrieves relevant project documents from the Web to obtain <italic>external knowledge</i>. By integrating internal and external knowledge, Malo generates reasonable explanations for faults in DApps. Our evaluation on a dataset of 68 real-world DApp faults demonstrates that Malo can locate 62% of faults within the Top-5, 9% higher than the state-of-the-art method. The experiment results also demonstrate a remarkable alignment accuracy of 71% between the explanations generated by Malo and the ground truth. In addition, we conduct a user study, which confirms that explanations generated by Malo can aid developers in comprehending the root cause of faults. Our code and dataset are available online: <uri>https://github.com/SodalimeZero/Malo_Code.git</uri>.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2197-2210"},"PeriodicalIF":6.5,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144288475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning BLAZE:通过动态分块和难示例学习实现跨语言和跨项目的Bug定位
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-12 DOI: 10.1109/TSE.2025.3579574
Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan
{"title":"BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning","authors":"Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan","doi":"10.1109/TSE.2025.3579574","DOIUrl":"10.1109/TSE.2025.3579574","url":null,"abstract":"Software bugs require developers to expend significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is crucial in reducing this effort. Existing bug localization tools, typically reliant on deep learning techniques, face limitations in both cross-project applicability and multi-language environments. Recent advancements with Large Language Models (LLMs) offer detailed representations for bug localization that may help to overcome such limitations. However, these models are known to encounter challenges with 1) limited context windows and 2) mapping accuracy. To address these challenges, we propose <monospace>BLAZE</monospace>, an approach that employs <i>dynamic chunking</i> and <i>hard example learning</i>. First, <monospace>BLAZE</monospace> dynamically segments source code to minimize continuity loss. Then, <monospace>BLAZE</monospace> fine-tunes a GPT-based model using complex bug reports in order to enhance cross-project and cross-language bug localization. To support the capability of <monospace>BLAZE</monospace>, we create the <monospace>BeetleBox</monospace> dataset, which comprises 23,782 bugs from 29 large and thriving open-source projects across five programming languages (Java, C<inline-formula><tex-math>$++$</tex-math></inline-formula>, Python, Go, and JavaScript). Our evaluation of <monospace>BLAZE</monospace> on three benchmark datasets—<monospace>BeetleBox</monospace>, SWE-Bench, and Ye et al.—demonstrates substantial improvements compared to six <i>state-of-the-art</i> baselines. Specifically, <monospace>BLAZE</monospace> achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR). Furthermore, an extensive ablation study confirms the contributions of our pipeline components to the overall performance enhancement.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2254-2267"},"PeriodicalIF":5.6,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144278229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MBL-CPDP: A Multi-Objective Bilevel Method for Cross-Project Defect Prediction MBL-CPDP:跨项目缺陷预测的多目标双层方法
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-10 DOI: 10.1109/TSE.2025.3577808
Jiaxin Chen;Jinliang Ding;Kay Chen Tan;Jiancheng Qian;Ke Li
{"title":"MBL-CPDP: A Multi-Objective Bilevel Method for Cross-Project Defect Prediction","authors":"Jiaxin Chen;Jinliang Ding;Kay Chen Tan;Jiancheng Qian;Ke Li","doi":"10.1109/TSE.2025.3577808","DOIUrl":"10.1109/TSE.2025.3577808","url":null,"abstract":"Cross-project defect prediction (CPDP) leverages machine learning (ML) techniques to proactively identify software defects, especially where project-specific data is scarce. However, existing CPDP approaches suffer from three critical limitations: ineffective exploration of high-dimensional parameter spaces, poor adaptability across diverse projects with heterogeneous data distributions, and inadequate handling of feature redundancy and distribution discrepancies between source and target projects. To address these challenges, we formulate CPDP as a multi-objective bilevel optimization (MBLO) method, dubbed <monospace>MBL-CPDP</monospace>. Our approach comprises two nested problems: the upper-level, a multi-objective combinatorial optimization problem, enhances robustness by optimizing ML pipelines that integrate feature selection, transfer learning, and classification techniques, while the lower-level problem fine-tunes their hyperparameters. Unlike traditional methods that employ fragmented optimization strategies or single-objective approaches that introduce bias, <monospace>MBL-CPDP</monospace> provides a holistic, end-to-end optimization framework. Additionally, we propose an ensemble learning method to better capture cross-project distribution differences and improve generalization across diverse datasets. An MBLO algorithm is then presented to effectively solve the formulated MBLO problem. To evaluate <monospace>MBL-CPDP</monospace>’s performance, we compare it with five automated ML tools and 50 CPDP techniques across 20 projects. Extensive empirical results show that <monospace>MBL-CPDP</monospace> outperforms the comparison methods, demonstrating its superior adaptability and comprehensive performance evaluation capability.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2305-2328"},"PeriodicalIF":5.6,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144260108","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GNNContext: GNN-based Code Context Prediction for Programming Tasks GNNCONTEXT:基于gnn的编程任务代码上下文预测
IF 5.6 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-10 DOI: 10.1109/TSE.2025.3578390
Xiaoye Zheng;Zhiyuan Wan;Shun Liu;Kaiwen Yang;David Lo;Xiaohu Yang
{"title":"GNNContext: GNN-based Code Context Prediction for Programming Tasks","authors":"Xiaoye Zheng;Zhiyuan Wan;Shun Liu;Kaiwen Yang;David Lo;Xiaohu Yang","doi":"10.1109/TSE.2025.3578390","DOIUrl":"10.1109/TSE.2025.3578390","url":null,"abstract":"A code context model comprises source code elements and their relations relevant to a programming task. The capture and use of code context models in software tools can benefit software development practices, such as code navigation and search. Prior research has explored approaches that leverage either the structural information of code or interaction histories of developers with integrated development environments to automate the construction of code context models. However, these approaches primarily capture shallow syntactic and lexical features of code elements, with limited ability to capture contextual and structural dependencies among neighboring code elements. In this paper, we propose <sc>GNNContext</small>, a novel approach for predicting code context models based on Graph Neural Networks. Our approach leverages code representation learning models to capture both the syntactic and semantic features of code elements, while employing Graph Neural Networks to learn the structural and contextual information among neighboring code elements in the code context models. To evaluate the effectiveness of our approach, we apply it to a dataset comprising 3,879 code context models that we derive from three Eclipse open-source projects. The evaluation results demonstrate that our proposed approach <sc>GNNContext</small> can significantly outperform the state-of-the-art baseline for code context prediction, achieving average improvements of 62.79%, 56.60%, 73.50% and 81.89% in mean reciprocal rank, top- 1, top-3, and top-5 recall rates, respectively, across predictions of varying steps. Moreover, our approach demonstrates robust performance in a cross-project evaluation setting.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 8","pages":"2268-2284"},"PeriodicalIF":5.6,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144259974","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Why Do Machine Learning Notebooks Crash? An Empirical Study on Public Python Jupyter Notebooks 为什么机器学习笔记本会崩溃?公共Python Jupyter笔记本的实证研究
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-06-03 DOI: 10.1109/TSE.2025.3574500
Yiran Wang;Willem Meijer;José Antonio Hernández López;Ulf Nilsson;Dániel Varró
{"title":"Why Do Machine Learning Notebooks Crash? An Empirical Study on Public Python Jupyter Notebooks","authors":"Yiran Wang;Willem Meijer;José Antonio Hernández López;Ulf Nilsson;Dániel Varró","doi":"10.1109/TSE.2025.3574500","DOIUrl":"10.1109/TSE.2025.3574500","url":null,"abstract":"Jupyter notebooks have become central in data science, integrating code, text and output in a flexible environment. With the rise of machine learning (ML), notebooks are increasingly used for prototyping and data analysis. However, due to their dependence on complex ML libraries and the flexible notebook semantics that allow cells to be run in any order, notebooks are susceptible to software bugs that may lead to program crashes. This paper presents a comprehensive empirical study focusing on crashes in publicly available Python ML notebooks. We collect 64,031 notebooks containing 92,542 crashes from GitHub and Kaggle, and manually analyze a sample of 746 crashes across various aspects, including crash types and root causes. Our analysis identifies unique ML-specific crash types, such as tensor shape mismatches and dataset value errors that violate API constraints. Additionally, we highlight unique root causes tied to notebook semantics, including out-of-order execution and residual errors from previous cells, which have been largely overlooked in prior research. Furthermore, we identify the most error-prone ML libraries, and analyze crash distribution across ML pipeline stages. We find that over 40% of crashes stem from API misuse and notebook-specific issues. Crashes frequently occur when using ML libraries like TensorFlow/Keras and Torch. Additionally, over 70% of the crashes occur during data preparation, model training, and evaluation or prediction stages of the ML pipeline, while data visualization errors tend to be unique to ML notebooks.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 7","pages":"2181-2196"},"PeriodicalIF":6.5,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11022755","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144210851","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Question Selection for Multimodal Code Search Synthesis Using Probabilistic Version Spaces 基于概率版本空间的多模态代码搜索综合问题选择
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-29 DOI: 10.1109/TSE.2025.3565387
Jiarong Wu;Yanyan Jiang;Lili Wei;Congying Xu;Shing-Chi Cheung;Chang Xu
{"title":"Question Selection for Multimodal Code Search Synthesis Using Probabilistic Version Spaces","authors":"Jiarong Wu;Yanyan Jiang;Lili Wei;Congying Xu;Shing-Chi Cheung;Chang Xu","doi":"10.1109/TSE.2025.3565387","DOIUrl":"10.1109/TSE.2025.3565387","url":null,"abstract":"Searching the occurrences of specific code patterns (code search) is a common task in software engineering, and programming by example (PBE) techniques have been applied to ease customizing code patterns. However, previous PBE tools only synthesize programs meeting the input-output examples, which may not always align with the user intent. To bridge this gap, this paper proposes <sc>Excalibur</small>, a multi-modal (example and natural language description) and interactive synthesizer for code search. <sc>Excalibur</small> ensures that the generated programs are correct for the provided examples (soundness) and include the user-intended program (bounded completeness). Furthermore, <sc>Excalibur</small> helps the user identify the user-intended program through question-answer interaction. To minimize the required interaction efforts, question selection is crucial. To improve question selection for code search, we propose probabilistic version spaces (ProbVS), in which the user-intended program’s probability is high and others are low. ProbVS combines traditional version spaces for compactly representing extensive programs and large language models (on the user-provided natural language description) for adjusting programs’ probabilities to align with users’ intents. Extensive experiments on a benchmark of 44 tasks demonstrated the effectiveness of <sc>Excalibur</small> and ProbVS and demystified how ProbVS affects probability distributions and how the configurable parameters affect ProbVS.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1724-1744"},"PeriodicalIF":6.5,"publicationDate":"2025-04-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143889991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DeepVec: State-Vector Aware Test Case Selection for Enhancing Recurrent Neural Network 深度vec:状态向量感知测试用例选择增强递归神经网络
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-28 DOI: 10.1109/TSE.2025.3565037
Zhonghao Jiang;Meng Yan;Li Huang;Weifeng Sun;Chao Liu;Song Sun;David Lo
{"title":"DeepVec: State-Vector Aware Test Case Selection for Enhancing Recurrent Neural Network","authors":"Zhonghao Jiang;Meng Yan;Li Huang;Weifeng Sun;Chao Liu;Song Sun;David Lo","doi":"10.1109/TSE.2025.3565037","DOIUrl":"10.1109/TSE.2025.3565037","url":null,"abstract":"Deep Neural Networks (DNN) have realized significant achievements across various application domains. There is no doubt that testing and enhancing a pre-trained DNN that has been deployed in an application scenario is crucial, because it can reduce the failures of the DNN. DNN-driven software testing and enhancement require large amounts of labeled data. The high cost and inefficiency caused by the large volume of data of manual labeling, and the time consumption of testing all cases in real scenarios are unacceptable. Therefore, test case selection technologies are proposed to reduce the time cost by selecting and only labeling representative test cases without compromising testing performance. Test case selection based on neuron coverage (NC) or uncertainty metrics has achieved significant success in Convolutional Neural Networks (CNN) testing. However, it is challenging to transfer these methods to Recurrent Neural Networks (RNN), which excel at text tasks, due to the mismatch in model output formats and the reliance on image-specific characteristics. What’s more, balancing the execution cost and performance of the algorithm is also indispensable. In this paper, we propose a state-vector aware test case selection method for RNN models, namely DeepVec, which reduces the cost of data labeling and saves computing resources and balances the execution cost and performance. DeepVec selects data using uncertainty metric based on the norm of the output vector at each time step (i.e., state-vector), and similarity metric based on the direction angle of the state-vector. Because test cases with smaller state-vector norms often possess greater information entropy and similar changes of state-vector direction angle indicate similar RNN internal states. These metrics can be calculated with just a single inference, which gives it strong bug detection and model improvement capabilities. We evaluate DeepVec on five popular datasets, containing images and texts as well as commonly used 3 RNN classification models, and compare it with NC-based, uncertainty-based, and other black-box methods. Experimental results demonstrate that DeepVec achieves an average relative improvement of 12.5%-118.22% over baseline methods in selecting fault-revealing test cases with time costs reduced to only 1% to 1‱. At the same time, we find that the absolute accuracy improvement after retraining outperforms baseline methods by 0.29%-24.01% when selecting 15% data to retrain.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1702-1723"},"PeriodicalIF":6.5,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143884684","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LLMorpheus: Mutation Testing Using Large Language Models LLMorpheus:使用大型语言模型的突变测试
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-25 DOI: 10.1109/TSE.2025.3562025
Frank Tip;Jonathan Bell;Max Schäfer
{"title":"LLMorpheus: Mutation Testing Using Large Language Models","authors":"Frank Tip;Jonathan Bell;Max Schäfer","doi":"10.1109/TSE.2025.3562025","DOIUrl":"10.1109/TSE.2025.3562025","url":null,"abstract":"In mutation testing, the quality of a test suite is evaluated by introducing faults into a program and determining whether the program’s tests detect them. Most existing approaches for mutation testing involve the application of a fixed set of mutation operators, e.g., replacing a “+” with a “-”, or removing a function’s body. However, certain types of real-world bugs cannot easily be simulated by such approaches, limiting their effectiveness. This paper presents a technique for mutation testing where placeholders are introduced at designated locations in a program’s source code and where a Large Language Model (LLM) is prompted to ask what they could be replaced with. The technique is implemented in <italic>LLMorpheus</i>, a mutation testing tool for JavaScript, and evaluated on 13 subject packages, considering several variations on the prompting strategy, and using several LLMs. We find <italic>LLMorpheus</i> to be capable of producing mutants that resemble existing bugs that cannot be produced by <italic>StrykerJS</i>, a state-of-the-art mutation testing tool. Moreover, we report on the running time, cost, and number of mutants produced by <italic>LLMorpheus</i>, demonstrating its practicality.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1645-1665"},"PeriodicalIF":6.5,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
On the Applicability of Code Language Models to Scientific Computing Programs 论代码语言模型在科学计算程序中的适用性
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-25 DOI: 10.1109/TSE.2025.3564599
Qianhui Zhao;Fang Liu;Xiao Long;Chengru Wu;Li Zhang
{"title":"On the Applicability of Code Language Models to Scientific Computing Programs","authors":"Qianhui Zhao;Fang Liu;Xiao Long;Chengru Wu;Li Zhang","doi":"10.1109/TSE.2025.3564599","DOIUrl":"10.1109/TSE.2025.3564599","url":null,"abstract":"Scientific Computing Programming Languages (SCPLs), like MATLAB and R, are popular and widely used for computational mathematics. In recent years, pre-trained code language models (CLMs) have automated many code-related tasks, covering various general programming languages. SCPLs share many similarities with general programming languages, including similar syntactic structures and the semantics of identifiers. Despite the similarities, there exist many differences between them. For example, lots of numerical operations and dedicated libraries exist in SCPLs. However, there has been little comprehensive work analyzing CLMs’ capabilities in the understanding and generation of pragmatic scientific computing programs. To this end, we investigate the applicability of code language models for the SCPL analysis, especially focusing on real-world code in open-source repositories. We first create a benchmark that contains programs and documentation from three widely used scientific computing programming languages, then perform an adequate evaluation of existing advanced code language models on both code understanding and generation tasks using the new benchmark, and study the relations of different training strategies, model types, and model sizes to the performance of different tasks and languages. Evaluation results confirm that, compared to general programming languages, SCPLs are more challenging to understand, and especially to generate, but the use of code language models is nevertheless feasible, and the knowledge obtained from the general languages can be transferred to SCPL analysis. A deeper analysis reveals additional challenges in generating code that incorporates API calls relevant to computational mathematics. We believe that our findings can provide guidance on improving tooling and analyses for the scientific programming languages, and also inspire and motivate researchers to improve the robustness of existing code language models.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1685-1701"},"PeriodicalIF":6.5,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Testing CPS With Design Assumptions-Based Metamorphic Relations and Genetic Programming 基于设计假设的变质关系和遗传规划检验CPS
IF 6.5 1区 计算机科学
IEEE Transactions on Software Engineering Pub Date : 2025-04-24 DOI: 10.1109/TSE.2025.3563121
Claudio Mandrioli;Seung Yeob Shin;Domenico Bianculli;Lionel Briand
{"title":"Testing CPS With Design Assumptions-Based Metamorphic Relations and Genetic Programming","authors":"Claudio Mandrioli;Seung Yeob Shin;Domenico Bianculli;Lionel Briand","doi":"10.1109/TSE.2025.3563121","DOIUrl":"10.1109/TSE.2025.3563121","url":null,"abstract":"Cyber-Physical Systems (CPSs) software is used to enforce desired behaviours on physical systems. To test the interaction between the CPS software and the system’s physics, engineers provide traces of desired physical states and observe traces of the actual physical states. CPS requirements describe how closely the actual physical traces should track the desired traces. These requirements are typically defined for specific, simple input traces such as step or ramp sequences, and thus are not applicable to arbitrary inputs. This limits the availability of oracles for CPSs. Our recent work proposes an approach to testing CPSs using control-theoretical design assumptions instead of requirements. This approach circumvents the oracle problem by leveraging the control-theoretical guarantees that are provided when the design assumptions are satisfied. To address the test case generation and oracle problems, researchers have proposed metamorphic testing, which is based on the study of relations across tests, i.e., metamorphic relations (MRs). In this work, we define MRs based on the design assumptions and explore combinations of these MRs using genetic programming to generate CPS test cases. This enables the generation of CPS input traces with potentially arbitrary shapes, together with associated expected output traces. We use the deviation from the expected output traces to guide the generation of input traces that falsify the MRs. Our experiment results show that the MR-falsification provides engineers with new information, helping them identify passed and failed test cases. Furthermore, we show that the generation of traces that falsify the MRs is a non-trivial problem, which cannot be addressed with a random generation approach but is successfully addressed by our approach based on genetic search.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"51 6","pages":"1666-1684"},"PeriodicalIF":6.5,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10976605","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143873070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信