Enhancing autonomous driving simulations: A hybrid metamorphic testing framework with metamorphic relations generated by GPT

IF 4.3 2区计算机科学 Q2 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information and Software Technology Pub Date : 2025-07-12 DOI:10.1016/j.infsof.2025.107828

Yifan Zhang , Tsong Yueh Chen , Matthew Pike , Dave Towey , Zhihao Ying , Zhi Quan Zhou

{"title":"Enhancing autonomous driving simulations: A hybrid metamorphic testing framework with metamorphic relations generated by GPT","authors":"Yifan Zhang , Tsong Yueh Chen , Matthew Pike , Dave Towey , Zhihao Ying , Zhi Quan Zhou","doi":"10.1016/j.infsof.2025.107828","DOIUrl":null,"url":null,"abstract":"<div><h3>Context:</h3><div>Autonomous Driving Systems (ADSs) have rapidly developed over the past decade. Given the complexity and cost of testing ADSs, advanced simulation tools like the CARLA simulator are essential for efficient algorithm development and validation. However, the intricacies of autonomous driving (AD) simulations pose challenges for software testing, particularly the oracle problem, which relates to the difficulty in determining the correctness of outputs within reasonable timeframes. While many studies validate ADS algorithms using simulations, few address the validity of the simulated data, a fundamental premise for ADS testing.</div></div><div><h3>Objective:</h3><div>This study addresses the oracle problem in AD simulations by employing Metamorphic Testing (MT) and Metamorphic Relations (MRs) to detect software defects in the CARLA simulator. Additionally, we explore AI-driven approaches, specifically integrating ChatGPT’s customizable features to enhance MR generation and refinement.</div></div><div><h3>Method:</h3><div>We propose a human-AI hybrid MT framework that combines human inputs with AI-driven automation to generate and refine MRs. The framework uses the GPT-MR generator, a customized large language model (LLM) based on Metamorphic Relation Patterns (MRPs) and ChatGPT, to produce MRs according to user specifications. These MRs are then refined by MT experts and fed into a test harness, automating test-case creation and execution while supporting diverse parameter inputs.</div></div><div><h3>Results:</h3><div>The GPT-MR generator produced effective MRs, leading to the discovery of four significant defects in the CARLA simulator, demonstrating their effectiveness in identifying software flaws. The test harness enabled efficient, automated testing across multiple modules and vehicle-control approaches, which enhanced the robustness and efficiency of our methods.</div></div><div><h3>Conclusions:</h3><div>Our study highlights the effectiveness of MT and MRPs in addressing the oracle problem for AD simulations, enhancing software reliability, and ensuring robust validation processes. The combination of AI-driven tools and human knowledge offers a structured methodology for validating simulated data and ADS performance, contributing to more reliable ADS development and testing.</div></div>","PeriodicalId":54983,"journal":{"name":"Information and Software Technology","volume":"187 ","pages":"Article 107828"},"PeriodicalIF":4.3000,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information and Software Technology","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950584925001673","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Context:

Autonomous Driving Systems (ADSs) have rapidly developed over the past decade. Given the complexity and cost of testing ADSs, advanced simulation tools like the CARLA simulator are essential for efficient algorithm development and validation. However, the intricacies of autonomous driving (AD) simulations pose challenges for software testing, particularly the oracle problem, which relates to the difficulty in determining the correctness of outputs within reasonable timeframes. While many studies validate ADS algorithms using simulations, few address the validity of the simulated data, a fundamental premise for ADS testing.

Objective:

This study addresses the oracle problem in AD simulations by employing Metamorphic Testing (MT) and Metamorphic Relations (MRs) to detect software defects in the CARLA simulator. Additionally, we explore AI-driven approaches, specifically integrating ChatGPT’s customizable features to enhance MR generation and refinement.

Method:

We propose a human-AI hybrid MT framework that combines human inputs with AI-driven automation to generate and refine MRs. The framework uses the GPT-MR generator, a customized large language model (LLM) based on Metamorphic Relation Patterns (MRPs) and ChatGPT, to produce MRs according to user specifications. These MRs are then refined by MT experts and fed into a test harness, automating test-case creation and execution while supporting diverse parameter inputs.

Results:

The GPT-MR generator produced effective MRs, leading to the discovery of four significant defects in the CARLA simulator, demonstrating their effectiveness in identifying software flaws. The test harness enabled efficient, automated testing across multiple modules and vehicle-control approaches, which enhanced the robustness and efficiency of our methods.

Conclusions:

Our study highlights the effectiveness of MT and MRPs in addressing the oracle problem for AD simulations, enhancing software reliability, and ensuring robust validation processes. The combination of AI-driven tools and human knowledge offers a structured methodology for validating simulated data and ADS performance, contributing to more reliable ADS development and testing.

查看原文本刊更多论文

增强自动驾驶仿真：基于GPT生成的变形关系的混合变形测试框架

背景：在过去十年中，自动驾驶系统（ads）得到了迅速发展。考虑到测试ads的复杂性和成本，像CARLA模拟器这样的先进仿真工具对于有效的算法开发和验证是必不可少的。然而，自动驾驶（AD）模拟的复杂性给软件测试带来了挑战，特别是oracle问题，这涉及到在合理的时间范围内确定输出正确性的困难。虽然许多研究通过模拟来验证ADS算法，但很少有研究解决模拟数据的有效性问题，而模拟数据是ADS测试的基本前提。目的：利用变形测试法（MT）和变形关系法（MRs）检测CARLA模拟器软件缺陷，解决AD模拟中的oracle问题。此外，我们探索人工智能驱动的方法，特别是集成ChatGPT的可定制功能，以增强MR的生成和细化。方法：我们提出了一个人类-人工智能混合机器翻译框架，该框架将人工输入与人工智能驱动的自动化相结合，以生成和精炼MRs.该框架使用GPT-MR生成器，一个基于变形关系模式（MRPs）和ChatGPT的定制大型语言模型（LLM），根据用户规格生成MRs.。然后由机器翻译专家对这些MRs进行改进，并将其输入到测试工具中，在支持不同参数输入的同时自动化测试用例的创建和执行。结果：GPT-MR生成器产生了有效的MRs，从而发现了CARLA模拟器中的四个重大缺陷，证明了它们在识别软件缺陷方面的有效性。该测试工具支持跨多个模块和车辆控制方法进行高效、自动化的测试，从而增强了我们方法的鲁棒性和效率。结论：我们的研究强调了MT和mrp在解决AD模拟的oracle问题、提高软件可靠性和确保稳健的验证过程方面的有效性。人工智能驱动的工具和人类知识的结合为验证模拟数据和ADS性能提供了结构化的方法，有助于更可靠的ADS开发和测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information and Software Technology 工程技术-计算机：软件工程

CiteScore

9.10

自引率

7.70%

发文量

164

审稿时长

9.6 weeks

期刊介绍： Information and Software Technology is the international archival journal focusing on research and experience that contributes to the improvement of software development practices. The journal''s scope includes methods and techniques to better engineer software and manage its development. Articles submitted for review should have a clear component of software engineering or address ways to improve the engineering and management of software development. Areas covered by the journal include: • Software management, quality and metrics, • Software processes, • Software architecture, modelling, specification, design and programming • Functional and non-functional software requirements • Software testing and verification & validation • Empirical studies of all aspects of engineering and managing software development Short Communications is a new section dedicated to short papers addressing new ideas, controversial opinions, "Negative" results and much more. Read the Guide for authors for more information. The journal encourages and welcomes submissions of systematic literature studies (reviews and maps) within the scope of the journal. Information and Software Technology is the premiere outlet for systematic literature studies in software engineering.