Qunying Song, Avner Bensoussan, Mohammad Reza Mousavi
{"title":"Synthetic versus real: an analysis of critical scenarios for autonomous vehicle testing","authors":"Qunying Song, Avner Bensoussan, Mohammad Reza Mousavi","doi":"10.1007/s10515-025-00499-4","DOIUrl":null,"url":null,"abstract":"<div><p>With the emergence of autonomous vehicles comes the requirement of adequate and rigorous testing, particularly in critical scenarios that are both challenging and potentially hazardous. Generating synthetic simulation-based critical scenarios for testing autonomous vehicles has therefore received considerable interest, yet it is unclear how such scenarios relate to the actual crash or near-crash scenarios in the real world. Consequently, their realism is unknown. In this paper, we define realism as the degree of similarity of synthetic critical scenarios to real-world critical scenarios. We propose a methodology to measure realism using two metrics, namely attribute distribution and Euclidean distance. The methodology extracts various attributes from synthetic and realistic critical scenario datasets and performs a set of statistical tests to compare their distributions and distances. As a proof of concept for our methodology, we compare synthetic collision scenarios from DeepScenario against realistic autonomous vehicle collisions collected by the Department of Motor Vehicles in California, to analyse how well DeepScenario synthetic collision scenarios are aligned with real autonomous vehicle collisions recorded in California. We focus on five key attributes that are extractable from both datasets, and analyse the attribution distribution and distance between scenarios in the two datasets. Further, we derive recommendations to improve the realism of synthetic scenarios based on our analysis. Our study of realism provides a framework that can be replicated and extended for other dataset both concerning real-world and synthetically-generated scenarios.</p></div>","PeriodicalId":55414,"journal":{"name":"Automated Software Engineering","volume":"32 2","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2025-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10515-025-00499-4.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Automated Software Engineering","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10515-025-00499-4","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0
Abstract
With the emergence of autonomous vehicles comes the requirement of adequate and rigorous testing, particularly in critical scenarios that are both challenging and potentially hazardous. Generating synthetic simulation-based critical scenarios for testing autonomous vehicles has therefore received considerable interest, yet it is unclear how such scenarios relate to the actual crash or near-crash scenarios in the real world. Consequently, their realism is unknown. In this paper, we define realism as the degree of similarity of synthetic critical scenarios to real-world critical scenarios. We propose a methodology to measure realism using two metrics, namely attribute distribution and Euclidean distance. The methodology extracts various attributes from synthetic and realistic critical scenario datasets and performs a set of statistical tests to compare their distributions and distances. As a proof of concept for our methodology, we compare synthetic collision scenarios from DeepScenario against realistic autonomous vehicle collisions collected by the Department of Motor Vehicles in California, to analyse how well DeepScenario synthetic collision scenarios are aligned with real autonomous vehicle collisions recorded in California. We focus on five key attributes that are extractable from both datasets, and analyse the attribution distribution and distance between scenarios in the two datasets. Further, we derive recommendations to improve the realism of synthetic scenarios based on our analysis. Our study of realism provides a framework that can be replicated and extended for other dataset both concerning real-world and synthetically-generated scenarios.
期刊介绍:
This journal details research, tutorial papers, survey and accounts of significant industrial experience in the foundations, techniques, tools and applications of automated software engineering technology. This includes the study of techniques for constructing, understanding, adapting, and modeling software artifacts and processes.
Coverage in Automated Software Engineering examines both automatic systems and collaborative systems as well as computational models of human software engineering activities. In addition, it presents knowledge representations and artificial intelligence techniques applicable to automated software engineering, and formal techniques that support or provide theoretical foundations. The journal also includes reviews of books, software, conferences and workshops.