Assessing the efficacy of pre-trained large language models in analyzing autonomous vehicle field test disengagements

IF 6.2 1区工程技术 Q1 ERGONOMICS

Accident; analysis and prevention Pub Date : 2025-07-25 DOI:10.1016/j.aap.2025.108178

Melika Ansarinejad , Sherif M. Gaweesh , Mohamed M. Ahmed

{"title":"Assessing the efficacy of pre-trained large language models in analyzing autonomous vehicle field test disengagements","authors":"Melika Ansarinejad , Sherif M. Gaweesh , Mohamed M. Ahmed","doi":"10.1016/j.aap.2025.108178","DOIUrl":null,"url":null,"abstract":"<div><div>This study evaluates the efficacy of pre-trained large language models (LLMs) in analyzing disengagement reports of Levels 2–3 autonomous vehicle (AV) field tests, utilizing data provided from California Department of Motor Vehicles. Disengagement reports document instances where autonomous vehicles, tested under the Autonomous Vehicle Tester (AVT) and AVT Driverless Programs, transition from autonomous to manual control. These disengagements occur when human intervention is required due to incidents or limitations in the operational design domain that prevent AVs from functioning properly. Understanding factors leading to disengagements is pivotal for assessing AV performance and guiding infrastructure owners and operators (IOOs) about modifications needed. Manual approaches for analysis of the disengagement data are labor-intensive and prone to human error. Our research investigates the capability of LLMs to automate this analysis, focusing on identifying patterns, categorizing disengagement causes, and extracting meaningful insights from extensive datasets. GPT-4o as an LLM was employed to analyze the disengagement reports. The study aims to measure the accuracy, efficiency, and reliability of these models in comparison to traditional techniques. The application of LLMs demonstrated significant potential in identifying insights from the disengagement dataset, while effectively processing the textual data, achieving an accuracy of 87%. Several data limitations were encountered, including inconsistencies in disengagement descriptions from different manufacturers, which posed challenges to standardizing the analysis. Additionally, the disengagement reports offered limited details on the specific causes of disengagements and the surrounding conditions, restricting the depth of insights that could be drawn. Despite these challenges, our findings indicate that LLMs can substantially enhance the speed and precision of analyzing AV disengagement reports, offering valuable insights, while being cost-effective, that can inform further research and development in AV technology and safety protocols.</div></div>","PeriodicalId":6926,"journal":{"name":"Accident; analysis and prevention","volume":"220 ","pages":"Article 108178"},"PeriodicalIF":6.2000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Accident; analysis and prevention","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0001457525002647","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ERGONOMICS","Score":null,"Total":0}

引用次数: 0

Abstract

This study evaluates the efficacy of pre-trained large language models (LLMs) in analyzing disengagement reports of Levels 2–3 autonomous vehicle (AV) field tests, utilizing data provided from California Department of Motor Vehicles. Disengagement reports document instances where autonomous vehicles, tested under the Autonomous Vehicle Tester (AVT) and AVT Driverless Programs, transition from autonomous to manual control. These disengagements occur when human intervention is required due to incidents or limitations in the operational design domain that prevent AVs from functioning properly. Understanding factors leading to disengagements is pivotal for assessing AV performance and guiding infrastructure owners and operators (IOOs) about modifications needed. Manual approaches for analysis of the disengagement data are labor-intensive and prone to human error. Our research investigates the capability of LLMs to automate this analysis, focusing on identifying patterns, categorizing disengagement causes, and extracting meaningful insights from extensive datasets. GPT-4o as an LLM was employed to analyze the disengagement reports. The study aims to measure the accuracy, efficiency, and reliability of these models in comparison to traditional techniques. The application of LLMs demonstrated significant potential in identifying insights from the disengagement dataset, while effectively processing the textual data, achieving an accuracy of 87%. Several data limitations were encountered, including inconsistencies in disengagement descriptions from different manufacturers, which posed challenges to standardizing the analysis. Additionally, the disengagement reports offered limited details on the specific causes of disengagements and the surrounding conditions, restricting the depth of insights that could be drawn. Despite these challenges, our findings indicate that LLMs can substantially enhance the speed and precision of analyzing AV disengagement reports, offering valuable insights, while being cost-effective, that can inform further research and development in AV technology and safety protocols.

查看原文本刊更多论文

评估预训练大型语言模型在自动驾驶汽车现场测试脱离分析中的有效性

本研究利用加州机动车辆管理局提供的数据，评估了预训练的大型语言模型（llm）在分析2-3级自动驾驶汽车（AV）现场测试的脱离报告方面的功效。脱离接触报告记录了在自动驾驶汽车测试仪（AVT）和AVT无人驾驶项目下测试的自动驾驶汽车从自动驾驶过渡到手动控制的情况。当由于事故或操作设计领域的限制而导致自动驾驶汽车无法正常运行时，就需要人为干预。了解导致脱离的因素对于评估自动驾驶汽车性能和指导基础设施所有者和运营商（ioo）进行所需的修改至关重要。分析脱离数据的人工方法是劳动密集型的，容易出现人为错误。我们的研究调查了法学硕士自动化分析的能力，重点是识别模式，对脱离原因进行分类，并从广泛的数据集中提取有意义的见解。采用gpt - 40作为法学硕士分析脱离报告。本研究旨在衡量这些模型与传统技术相比的准确性、效率和可靠性。llm的应用在识别脱离数据集的见解方面显示出巨大的潜力，同时有效地处理文本数据，达到87%的准确率。遇到了一些数据限制，包括来自不同制造商的分离描述不一致，这对标准化分析提出了挑战。此外，脱离接触报告对脱离接触的具体原因和周围条件提供的细节有限，限制了可以得出的见解的深度。尽管存在这些挑战，我们的研究结果表明，llm可以大大提高分析自动驾驶脱离报告的速度和精度，提供有价值的见解，同时具有成本效益，可以为自动驾驶技术和安全协议的进一步研究和开发提供信息。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Accident; analysis and prevention Multiple-

CiteScore

11.90

自引率

16.90%

发文量

264

审稿时长

48 days

期刊介绍： Accident Analysis & Prevention provides wide coverage of the general areas relating to accidental injury and damage, including the pre-injury and immediate post-injury phases. Published papers deal with medical, legal, economic, educational, behavioral, theoretical or empirical aspects of transportation accidents, as well as with accidents at other sites. Selected topics within the scope of the Journal may include: studies of human, environmental and vehicular factors influencing the occurrence, type and severity of accidents and injury; the design, implementation and evaluation of countermeasures; biomechanics of impact and human tolerance limits to injury; modelling and statistical analysis of accident data; policy, planning and decision-making in safety.