A physics-informed train on synthetic and test on real method for evaluating large language model-generated safety-critical traffic scenarios

IF 9.1 1区工程技术 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computer-Aided Civil and Infrastructure Engineering Pub Date : 2025-09-17 DOI:10.1111/mice.70071

Mo Jia, Qixiu Cheng, Chengkun Tao, Yetao Hu, Qi Hong, Wenzhe Cheng, Zhiyuan Liu

{"title":"A physics-informed train on synthetic and test on real method for evaluating large language model-generated safety-critical traffic scenarios","authors":"Mo Jia, Qixiu Cheng, Chengkun Tao, Yetao Hu, Qi Hong, Wenzhe Cheng, Zhiyuan Liu","doi":"10.1111/mice.70071","DOIUrl":null,"url":null,"abstract":"Corner cases, which are rare and high-risk scenarios such as safety-critical behaviors in autonomous vehicle operations, present significant modeling challenges due to their low occurrence probability and limited data availability. Large language models (LLMs) offer new potential for synthesizing such scenarios, but existing evaluation metrics are inadequate because corner case data typically lack one-to-one mapping to real samples and have extremely limited instances. To address this, we propose a two-stage evaluation framework, that is, a physics-informed train on synthetic and test on real (PI-TSTR) framework. Using safety-critical car-following (CF) scenarios as an example, we design a prompting and interpolation strategy to guide LLMs in generating physically feasible synthetic follower trajectories from real leading vehicle inputs. We then evaluate the generated data by training several CF models, that is, extended S-shaped three-parameter (ES3) model, Gipps model, optimal velocity model (OVM), improved full velocity difference model (IFVDM), intelligent driver model (IDM), and testing their performances on real-world data. The CF models trained on LLM-generated trajectories show strong generalization to real scenarios, validating the quality of the synthetic data. This framework provides a physics-grounded approach for evaluating LLM-generated data in safety-critical, data-scarce domains.","PeriodicalId":156,"journal":{"name":"Computer-Aided Civil and Infrastructure Engineering","volume":"316 1","pages":""},"PeriodicalIF":9.1000,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer-Aided Civil and Infrastructure Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1111/mice.70071","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Corner cases, which are rare and high-risk scenarios such as safety-critical behaviors in autonomous vehicle operations, present significant modeling challenges due to their low occurrence probability and limited data availability. Large language models (LLMs) offer new potential for synthesizing such scenarios, but existing evaluation metrics are inadequate because corner case data typically lack one-to-one mapping to real samples and have extremely limited instances. To address this, we propose a two-stage evaluation framework, that is, a physics-informed train on synthetic and test on real (PI-TSTR) framework. Using safety-critical car-following (CF) scenarios as an example, we design a prompting and interpolation strategy to guide LLMs in generating physically feasible synthetic follower trajectories from real leading vehicle inputs. We then evaluate the generated data by training several CF models, that is, extended S-shaped three-parameter (ES3) model, Gipps model, optimal velocity model (OVM), improved full velocity difference model (IFVDM), intelligent driver model (IDM), and testing their performances on real-world data. The CF models trained on LLM-generated trajectories show strong generalization to real scenarios, validating the quality of the synthetic data. This framework provides a physics-grounded approach for evaluating LLM-generated data in safety-critical, data-scarce domains.

查看原文本刊更多论文

基于物理的综合训练和真实方法的测试，用于评估大型语言模型生成的安全关键交通场景

边缘案例是一种罕见且高风险的场景，如自动驾驶汽车运行中的安全关键行为，由于其发生概率低且数据可用性有限，因此给建模带来了重大挑战。大型语言模型（llm）为综合这些场景提供了新的潜力，但是现有的评估度量是不充分的，因为极端情况数据通常缺乏与真实样本的一对一映射，并且具有极其有限的实例。为了解决这个问题，我们提出了一个两阶段的评估框架，即一个物理知识的合成训练和真实测试（PI-TSTR）框架。以安全关键车辆跟随（CF）场景为例，我们设计了一个提示和插值策略，以指导llm从真实的领先车辆输入生成物理上可行的合成跟随轨迹。然后，我们通过训练扩展s形三参数模型（ES3）、Gipps模型、最优速度模型（OVM）、改进全速差模型（IFVDM）、智能驾驶员模型（IDM）等CF模型来评估生成的数据，并在实际数据上测试它们的性能。在llm生成的轨迹上训练的CF模型显示出对真实场景的强泛化，验证了合成数据的质量。该框架提供了一种基于物理的方法，用于评估安全关键、数据稀缺领域中llm生成的数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computer-Aided Civil and Infrastructure Engineering 工程技术-工程：土木

CiteScore

17.60

自引率

19.80%

发文量

146

审稿时长

1 months

期刊介绍： Computer-Aided Civil and Infrastructure Engineering stands as a scholarly, peer-reviewed archival journal, serving as a vital link between advancements in computer technology and civil and infrastructure engineering. The journal serves as a distinctive platform for the publication of original articles, spotlighting novel computational techniques and inventive applications of computers. Specifically, it concentrates on recent progress in computer and information technologies, fostering the development and application of emerging computing paradigms. Encompassing a broad scope, the journal addresses bridge, construction, environmental, highway, geotechnical, structural, transportation, and water resources engineering. It extends its reach to the management of infrastructure systems, covering domains such as highways, bridges, pavements, airports, and utilities. The journal delves into areas like artificial intelligence, cognitive modeling, concurrent engineering, database management, distributed computing, evolutionary computing, fuzzy logic, genetic algorithms, geometric modeling, internet-based technologies, knowledge discovery and engineering, machine learning, mobile computing, multimedia technologies, networking, neural network computing, optimization and search, parallel processing, robotics, smart structures, software engineering, virtual reality, and visualization techniques.