{"title":"Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction","authors":"Sungmin Kang;Juyeon Yoon;Nargiz Askarbekkyzy;Shin Yoo","doi":"10.1109/TSE.2024.3450837","DOIUrl":"10.1109/TSE.2024.3450837","url":null,"abstract":"Bug reproduction is a critical developer activity that is also challenging to automate, as bug reports are often in natural language and thus can be difficult to transform to test cases consistently. As a result, existing techniques mostly focused on crash bugs, which are easier to automatically detect and verify. In this work, we overcome this limitation by using large language models (LLMs), which have been demonstrated to be adept at natural language processing and code generation. By prompting LLMs to generate bug-reproducing tests, and via a post-processing pipeline to automatically identify promising generated tests, our proposed technique \u0000<sc>Libro</small>\u0000 could successfully reproduce about one-third of all bugs in the widely used Defects4J benchmark. Furthermore, our extensive evaluation on 15 LLMs, including 11 open-source LLMs, suggests that open-source LLMs also demonstrate substantial potential, with the StarCoder LLM achieving 70% of the reproduction performance of the closed-source OpenAI LLM code-davinci-002 on the large Defects4J benchmark, and 90% of performance on a held-out bug dataset likely not part of any LLM's training data. In addition, our experiments on LLMs of different sizes show that bug reproduction using \u0000<sc>Libro</small>\u0000 improves as LLM size increases, providing information as to which LLMs can be used with the \u0000<sc>Libro</small>\u0000 pipeline.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2677-2694"},"PeriodicalIF":6.5,"publicationDate":"2024-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142138065","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RLocator: Reinforcement Learning for Bug Localization","authors":"Partha Chakraborty;Mahmoud Alfadel;Meiyappan Nagappan","doi":"10.1109/TSE.2024.3452595","DOIUrl":"10.1109/TSE.2024.3452595","url":null,"abstract":"Software developers spend a significant portion of time fixing bugs in their projects. To streamline this process, bug localization approaches have been proposed to identify the source code files that are likely responsible for a particular bug. Prior work proposed several similarity-based machine-learning techniques for bug localization. Despite significant advances in these techniques, they do not directly optimize the evaluation measures. We argue that directly optimizing evaluation measures can positively contribute to the performance of bug localization approaches. Therefore, in this paper, we utilize Reinforcement Learning (RL) techniques to directly optimize the ranking metrics. We propose \u0000<sc>RLocator</small>\u0000, a Reinforcement Learning-based bug localization approach. We formulate RLocator using a Markov Decision Process (MDP) to optimize the evaluation measures directly. We present the technique and experimentally evaluate it based on a benchmark dataset of 8,316 bug reports from six highly popular Apache projects. The results of our evaluation reveal that RLocator achieves a Mean Reciprocal Rank (MRR) of 0.62, a Mean Average Precision (MAP) of 0.59, and a Top 1 score of 0.46. We compare RLocator with three state-of-the-art bug localization tools, FLIM, BugLocator, and BL-GAN. Our evaluation reveals that RLocator outperforms both approaches by a substantial margin, with improvements of 38.3% in MAP, 36.73% in MRR, and 23.68% in the Top K metric. These findings highlight that directly optimizing evaluation measures considerably contributes to performance improvement of the bug localization problem.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2695-2708"},"PeriodicalIF":6.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xin Zhou;Bowen Xu;Kisub Kim;DongGyun Han;Hung Huu Nguyen;Thanh Le-Cong;Junda He;Bach Le;David Lo
{"title":"Leveraging Large Language Model for Automatic Patch Correctness Assessment","authors":"Xin Zhou;Bowen Xu;Kisub Kim;DongGyun Han;Hung Huu Nguyen;Thanh Le-Cong;Junda He;Bach Le;David Lo","doi":"10.1109/TSE.2024.3452252","DOIUrl":"10.1109/TSE.2024.3452252","url":null,"abstract":"Automated Program Repair (APR) techniques have shown more and more promising results in fixing real-world bugs. Despite the effectiveness, APR techniques still face an overfitting problem: a generated patch can be incorrect although it passes all tests. It is time-consuming to manually evaluate the correctness of generated patches that can pass all available test cases. To address this problem, many approaches have been proposed to automatically assess the correctness of patches generated by APR techniques. These approaches are mainly evaluated within the cross-validation setting. However, for patches generated by a new or unseen APR tool, users are implicitly required to manually label a significant portion of these patches (e.g., 90% in 10-fold cross-validation) in the cross-validation setting before inferring the remaining patches (e.g., 10% in 10-fold cross-validation). To mitigate the issue, in this study, we propose \u0000<bold>LLM4PatchCorrect</b>\u0000, the patch correctness assessment by adopting a large language model for code. Specifically, for patches generated by a new or unseen APR tool, LLM4PatchCorrect does not need labeled patches of this new or unseen APR tool for training but directly queries the large language model for code to get predictions on the correctness labels without training. In this way, LLM4PatchCorrect can reduce the manual labeling effort when building a model to automatically assess the correctness of generated patches of new APR tools. To provide knowledge regarding the automatic patch correctness assessment (APCA) task to the large language model for code, LLM4PatchCorrect leverages bug descriptions, execution traces, failing test cases, test coverage, and labeled patches generated by existing APR tools, before deciding the correctness of the unlabeled patches of a new or unseen APR tool. Additionally, LLM4PatchCorrect prioritizes labeled patches from existing APR tools that exhibit semantic similarity to those generated by new APR tools, enhancing the accuracy achieved by LLM4PatchCorrect for patches from new APR tools. Our experimental results showed that LLM4PatchCorrect can achieve an accuracy of 84.4% and an F1-score of 86.5% on average although no labeled patch of the new or unseen APR tool is available. In addition, our proposed technique significantly outperformed the prior state-of-the-art.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 11","pages":"2865-2883"},"PeriodicalIF":6.5,"publicationDate":"2024-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142101282","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingwen Liu;Wuxia Jin;Junhui Zhou;Qiong Feng;Ming Fan;Haijun Wang;Ting Liu
{"title":"3Erefactor: Effective, Efficient and Executable Refactoring Recommendation for Software Architectural Consistency","authors":"Jingwen Liu;Wuxia Jin;Junhui Zhou;Qiong Feng;Ming Fan;Haijun Wang;Ting Liu","doi":"10.1109/TSE.2024.3449564","DOIUrl":"10.1109/TSE.2024.3449564","url":null,"abstract":"As software continues to evolve and business functions become increasingly complex, architectural inconsistency arises when the implementation architecture deviates from the expected architecture design. This architectural problem makes maintenance difficult and requires significant effort to refactor. To assist labor-intensive refactoring, automated refactoring has received much attention such as searching for optimal refactoring solutions. However, there are still three limitations: The recommended refactorings are insufficiently effective in addressing architectural consistency; the search process for refactoring solution is inefficient; and there is a lack of executable refactoring solutions. To address these limitations, we propose an effective, efficient, and executable refactoring recommendation approach namely the 3Erefactor for software architectural consistency. To achieve effective refactoring, 3Erefactor uses NSGA-II to generate refactoring solutions that minimize architectural inconsistencies at module level and entity level. To achieve efficient refactoring, 3Erefactor leverages architecture recovery technique to locate files requiring refactoring, helping accelerate the convergence of refactoring algorithm. To achieve executable refactoring, 3Erefactor designs a set of refactoring executability constraint strategies during the refactoring solution search and generation, including improving refactoring pre-conditions and removing invalid operations in refactoring solutions. We evaluated our approach on six open source systems. Statistical analysis of our experiments shows that, the refactoring solution generated by 3Erefactor performed significantly better than 3 state-of-the-art approaches in terms of reducing the number of architectural inconsistencies, improving the efficiency of the refactoring algorithm and improving the executability of refactorings.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2633-2655"},"PeriodicalIF":6.5,"publicationDate":"2024-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142089992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Method-Level Test-to-Code Traceability Link Construction by Semantic Correlation Learning","authors":"Weifeng Sun;Zhenting Guo;Meng Yan;Zhongxin Liu;Yan Lei;Hongyu Zhang","doi":"10.1109/TSE.2024.3449917","DOIUrl":"10.1109/TSE.2024.3449917","url":null,"abstract":"Test-to-code traceability links (TCTLs) establish links between test artifacts and code artifacts. These links enable developers and testers to quickly identify the specific pieces of code tested by particular test cases, thus facilitating more efficient debugging, regression testing, and maintenance activities. Various approaches, based on distinct concepts, have been proposed to establish method-level TCTLs, specifically linking unit tests to corresponding focal methods. Static methods, such as naming-convention-based methods, use heuristic- and similarity-based strategies. However, such methods face the following challenges: ① Developers, driven by specific scenarios and development requirements, may deviate from naming conventions, leading to TCTL identification failures. ② Static methods often overlook the rich semantics embedded within tests, leading to erroneous associations between tests and semantically unrelated code fragments. Although dynamic methods achieve promising results, they require the project to be compilable and the tests to be executable, limiting their usability. This limitation is significant for downstream tasks requiring massive test-code pairs, as not all projects can meet these requirements. To tackle the abovementioned limitations, we propose a novel static method-level TCTL approach, named \u0000<sc>TestLinker</small>\u0000. For the first challenge of existing static approaches, \u0000<sc>TestLinker</small>\u0000 introduces a two-phase TCTL framework to accommodate different project types in a triage manner. As for the second challenge, we employ the \u0000<italic>semantic correlation learning</i>\u0000, which learns and establishes the semantic correlations between tests and focal methods based on Pre-trained Code Models (PCMs). \u0000<sc>TestLinker</small>\u0000 further establishes mapping rules to accurately link the recommended function name to the concrete production function declaration. Empirical evaluation on a meticulously labeled dataset reveals that \u0000<sc>TestLinker</small>\u0000 significantly outperforms traditional static techniques, showing average F1-score improvements ranging from 73.48% to 202.00%. Moreover, compared to state-of-the-art dynamic methods, \u0000<sc>TestLinker</small>\u0000, which only leverages static information, demonstrates comparable or even better performance, with an average F1-score increase of 37.40%.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2656-2676"},"PeriodicalIF":6.5,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142085527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matteo Paltenghi;Rahul Pandita;Austin Z. Henley;Albert Ziegler
{"title":"Follow-Up Attention: An Empirical Study of Developer and Neural Model Code Exploration","authors":"Matteo Paltenghi;Rahul Pandita;Austin Z. Henley;Albert Ziegler","doi":"10.1109/TSE.2024.3445338","DOIUrl":"10.1109/TSE.2024.3445338","url":null,"abstract":"Recent neural models of code, such as OpenAI Codex and AlphaCode, have demonstrated remarkable proficiency at code generation due to the underlying attention mechanism. However, it often remains unclear how the models actually process code, and to what extent their reasoning and the way their attention mechanism scans the code matches the patterns of developers. A poor understanding of the model reasoning process limits the way in which current neural models are leveraged today, so far mostly for their raw prediction. To fill this gap, this work studies how the processed attention signal of three open large language models - CodeGen, InCoder and GPT-J - agrees with how developers look at and explore code when each answers the same sensemaking questions about code. Furthermore, we contribute an open-source eye-tracking dataset comprising 92 manually-labeled sessions from 25 developers engaged in sensemaking tasks. We empirically evaluate five heuristics that do not use the attention and ten attention-based post-processing approaches of the attention signal of CodeGen against our ground truth of developers exploring code, including the novel concept of \u0000<italic>follow-up attention</i>\u0000 which exhibits the highest agreement between model and human attention. Our follow-up attention method can predict the next line a developer will look at with 47% accuracy. This outperforms the baseline prediction accuracy of 42.3%, which uses the session history of other developers to recommend the next line. These results demonstrate the potential of leveraging the attention signal of pre-trained models for effective code exploration.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2568-2582"},"PeriodicalIF":6.5,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10645745","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142045621","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EpiTESTER: Testing Autonomous Vehicles With Epigenetic Algorithm and Attention Mechanism","authors":"Chengjie Lu;Shaukat Ali;Tao Yue","doi":"10.1109/TSE.2024.3449429","DOIUrl":"10.1109/TSE.2024.3449429","url":null,"abstract":"Testing autonomous vehicles (AVs) under various environmental scenarios that lead the vehicles to unsafe situations is challenging. Given the infinite possible environmental scenarios, it is essential to find critical scenarios efficiently. To this end, we propose a novel testing method, named \u0000<italic>EpiTESTER</i>\u0000, by taking inspiration from epigenetics, which enables species to adapt to sudden environmental changes. In particular, \u0000<italic>EpiTESTER</i>\u0000 adopts gene silencing as its epigenetic mechanism, which regulates gene expression to prevent the expression of a certain gene, and the probability of gene expression is dynamically computed as the environment changes. Given different data modalities (e.g., images, lidar point clouds) in the context of AV, \u0000<italic>EpiTESTER</i>\u0000 benefits from a multi-model fusion transformer to extract high-level feature representations from environmental factors. Next, it calculates probabilities based on these features with the attention mechanism. To assess the cost-effectiveness of \u0000<italic>EpiTESTER</i>\u0000, we compare it with a probabilistic search algorithm (Simulated Annealing, SA), a classical genetic algorithm (GA) (i.e., without any epigenetic mechanism implemented), and \u0000<italic>EpiTESTER</i>\u0000 with equal probability for each gene. We evaluate \u0000<italic>EpiTESTER</i>\u0000 with six initial environments from CARLA, an open-source simulator for autonomous driving research, and two end-to-end AV controllers, Interfuser and TCP. Our results show that \u0000<italic>EpiTESTER</i>\u0000 achieved a promising performance in identifying critical scenarios compared to the baselines, showing that applying epigenetic mechanisms is a good option for solving practical problems.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2614-2632"},"PeriodicalIF":6.5,"publicationDate":"2024-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142045638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vikram Nitin;Anne Mulhern;Sanjay Arora;Baishakhi Ray
{"title":"Yuga: Automatically Detecting Lifetime Annotation Bugs in the Rust Language","authors":"Vikram Nitin;Anne Mulhern;Sanjay Arora;Baishakhi Ray","doi":"10.1109/TSE.2024.3447671","DOIUrl":"10.1109/TSE.2024.3447671","url":null,"abstract":"The Rust programming language is becoming increasingly popular among systems programmers due to its efficient performance and robust memory safety guarantees. Rust employs an ownership model to ensure these guarantees by allowing each value to be owned by only one identifier at a time. It uses the concept of borrowing and lifetimes to enable other variables to temporarily borrow values. Despite its benefits, security vulnerabilities have been reported in Rust projects, often attributed to the use of “unsafe” Rust code. These vulnerabilities, in part, arise from incorrect lifetime annotations on function signatures. However, existing tools fail to detect these bugs, primarily because such bugs are rare, challenging to detect through dynamic analysis, and require explicit memory models. To overcome these limitations, we characterize incorrect lifetime annotations as a source of memory safety bugs and leverage this understanding to devise a novel static analysis tool, \u0000<sc>Yuga</small>\u0000, to detect potential lifetime annotation bugs. \u0000<sc>Yuga</small>\u0000 uses a multi-phase analysis approach, starting with a quick pattern-matching algorithm to identify potential buggy components and then conducting a flow and field-sensitive alias analysis to confirm the bugs. We also curate new datasets of lifetime annotation bugs. \u0000<sc>Yuga</small>\u0000 successfully detects bugs with good precision on these datasets, and we make the code and datasets publicly available.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2602-2613"},"PeriodicalIF":6.5,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142042494","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"iTCRL: Causal-Intervention-Based Trace Contrastive Representation Learning for Microservice Systems","authors":"Xiangbo Tian;Shi Ying;Tiangang Li;Mengting Yuan;Ruijin Wang;Yishi Zhao;Jianga Shang","doi":"10.1109/TSE.2024.3446532","DOIUrl":"10.1109/TSE.2024.3446532","url":null,"abstract":"Nowadays, microservice architecture has become mainstream way of cloud applications delivery. Distributed tracing is crucial to preserve the observability of microservice systems. However, existing trace representation approaches only concentrate on operations, relationships and metrics related to service invocations. They ignore service events that denotes meaningful, singular point in time during the service's duration. In this paper, we propose iTCRL, a novel trace contrastive representation learning approach based on causal intervention. This approach first constructs a unified graph representation for each trace to describe the runtime status of service events in traces and the complex relationships between them. Then, Causal-intervention-based Trace Contrastive Learning is proposed, which learns trace representations from causal perspective based on the unified graph representations of traces. It uses causal intervention to generate contrastive views, heterogeneous graph neural network-based trace encoder to learn trace representations, and direct causal effect to guide the training of trace encoder. Experimental results on three datasets show that iTCRL outperforms all baselines in terms of trace classification, trace anomaly detection, trace sampling and noise robustness, and also validate the contribution of Causal-intervention-based Trace Contrastive Learning.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2583-2601"},"PeriodicalIF":6.5,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142022071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ricardo Caldas;Juan Antonio Piñera García;Matei Schiopu;Patrizio Pelliccione;Genaína Rodrigues;Thorsten Berger
{"title":"Runtime Verification and Field-Based Testing for ROS-Based Robotic Systems","authors":"Ricardo Caldas;Juan Antonio Piñera García;Matei Schiopu;Patrizio Pelliccione;Genaína Rodrigues;Thorsten Berger","doi":"10.1109/TSE.2024.3444697","DOIUrl":"10.1109/TSE.2024.3444697","url":null,"abstract":"Robotic systems are becoming pervasive and adopted in increasingly many domains, such as manufacturing, healthcare, and space exploration. To this end, engineering software has emerged as a crucial discipline for building maintainable and reusable robotic systems. The field of robotics software engineering research has received increasing attention, fostering autonomy as a fundamental goal. However, robotics developers are still challenged trying to achieve this goal given that simulation is not able to deliver solutions to realistically emulate real-world phenomena. Robots also need to operate in unpredictable and uncontrollable environments, which require safe and trustworthy self-adaptation capabilities implemented in software. Typical techniques to address the challenges are runtime verification, field-based testing, and mitigation techniques that enable fail-safe solutions. However, there is no clear guidance to architect ROS-based systems to enable and facilitate runtime verification and field-based testing. This paper aims to fill in this gap by providing guidelines that can help developers and quality assurance (QA) teams when developing, verifying or testing their robots in the field. These guidelines are carefully tailored to address the challenges and requirements of testing robotics systems in real-world scenarios. We conducted (i) a literature review on studies addressing runtime verification and field-based testing for robotic systems, (ii) mined ROS-based applications repositories, and (iii) validated the applicability, clarity, and usefulness via two questionnaires with 55 answers overall. We contribute 20 guidelines: 8 for developers and 12 for QA teams formulated for researchers and practitioners in robotic software engineering. Finally, we map our guidelines to open challenges thus far in runtime verification and field-based testing for ROS-based systems and, we outline promising research directions in the field. \u0000<bold>Guidelines website and replication package:</b>\u0000 \u0000<uri>https://ros-rvft.github.io</uri>\u0000.","PeriodicalId":13324,"journal":{"name":"IEEE Transactions on Software Engineering","volume":"50 10","pages":"2544-2567"},"PeriodicalIF":6.5,"publicationDate":"2024-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10638820","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142007293","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}