Foundation model-assisted interpretable vehicle behavior decision making

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-06-11 DOI:10.1016/j.knosys.2025.113868

Shiyu Meng, Yi Wang, Yawen Cui, Lap-Pui Chau

{"title":"Foundation model-assisted interpretable vehicle behavior decision making","authors":"Shiyu Meng, Yi Wang, Yawen Cui, Lap-Pui Chau","doi":"10.1016/j.knosys.2025.113868","DOIUrl":null,"url":null,"abstract":"<div><div>Intelligent autonomous driving systems must achieve accurate perception and driving decisions to enhance their effectiveness and adoption. Currently, driving behavior decisions have achieved high performance thanks to deep learning technology. However, most existing approaches lack interpretability, reducing user trust and hindering widespread adoption. While some efforts focus on transparency through strategies like heat maps, cost-volume, and auxiliary tasks, they often provide limited model interpretation or require additional annotations. In this paper, we present a novel unified framework to tackle these issues by integrating ego-vehicle behavior decisions with human-centric language-based interpretation prediction from ego-view visual input. First, we propose a self-supervised class-agnostic object Segmentor module based on Segment Anything Model and 2-D light adapter strategy, to capture the overall surrounding cues without any extra segmentation mask labels. Second, the semantic extractor is adopted to generate the hierarchical semantic-level cues. Subsequently, a fusion module is designed to generate the refined global features by incorporating the class-agnostic object features and semantic-level features using a self-attention mechanism. Finally, vehicle behavior decisions and possible human-centric interpretations are jointly generated based on the global fusion context. The experimental results across various settings on the public datasets demonstrate the superiority and effectiveness of our proposed solution.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113868"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125009141","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Intelligent autonomous driving systems must achieve accurate perception and driving decisions to enhance their effectiveness and adoption. Currently, driving behavior decisions have achieved high performance thanks to deep learning technology. However, most existing approaches lack interpretability, reducing user trust and hindering widespread adoption. While some efforts focus on transparency through strategies like heat maps, cost-volume, and auxiliary tasks, they often provide limited model interpretation or require additional annotations. In this paper, we present a novel unified framework to tackle these issues by integrating ego-vehicle behavior decisions with human-centric language-based interpretation prediction from ego-view visual input. First, we propose a self-supervised class-agnostic object Segmentor module based on Segment Anything Model and 2-D light adapter strategy, to capture the overall surrounding cues without any extra segmentation mask labels. Second, the semantic extractor is adopted to generate the hierarchical semantic-level cues. Subsequently, a fusion module is designed to generate the refined global features by incorporating the class-agnostic object features and semantic-level features using a self-attention mechanism. Finally, vehicle behavior decisions and possible human-centric interpretations are jointly generated based on the global fusion context. The experimental results across various settings on the public datasets demonstrate the superiority and effectiveness of our proposed solution.

查看原文本刊更多论文

基础模型辅助可解释车辆行为决策

智能自动驾驶系统必须实现准确的感知和驾驶决策，以提高其有效性和采用率。目前，由于深度学习技术，驾驶行为决策已经取得了很高的性能。然而，大多数现有方法缺乏可解释性，降低了用户信任并阻碍了广泛采用。虽然有些努力通过热图、成本量和辅助任务等策略关注透明度，但它们通常提供有限的模型解释或需要额外的注释。在本文中，我们提出了一个新的统一框架，通过将自我-车辆行为决策与基于以人为中心的基于语言的自我视图视觉输入的解释预测相结合来解决这些问题。首先，我们提出了一个基于任意分割模型和二维光适配器策略的自监督类不可知对象分割模块，以捕获整体周围线索，而无需任何额外的分割掩码标签。其次，采用语义提取器生成分层语义级线索；随后，设计了融合模块，通过自关注机制将类不可知的对象特征和语义级特征结合起来，生成精细化的全局特征。最后，基于全局融合上下文，共同生成车辆行为决策和可能的以人为中心的解释。在公共数据集上的不同设置下的实验结果表明了我们所提出的解决方案的优越性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.