{"title":"Foundation model-assisted interpretable vehicle behavior decision making","authors":"Shiyu Meng, Yi Wang, Yawen Cui, Lap-Pui Chau","doi":"10.1016/j.knosys.2025.113868","DOIUrl":null,"url":null,"abstract":"<div><div>Intelligent autonomous driving systems must achieve accurate perception and driving decisions to enhance their effectiveness and adoption. Currently, driving behavior decisions have achieved high performance thanks to deep learning technology. However, most existing approaches lack interpretability, reducing user trust and hindering widespread adoption. While some efforts focus on transparency through strategies like heat maps, cost-volume, and auxiliary tasks, they often provide limited model interpretation or require additional annotations. In this paper, we present a novel unified framework to tackle these issues by integrating ego-vehicle behavior decisions with human-centric language-based interpretation prediction from ego-view visual input. First, we propose a self-supervised class-agnostic object Segmentor module based on Segment Anything Model and 2-D light adapter strategy, to capture the overall surrounding cues without any extra segmentation mask labels. Second, the semantic extractor is adopted to generate the hierarchical semantic-level cues. Subsequently, a fusion module is designed to generate the refined global features by incorporating the class-agnostic object features and semantic-level features using a self-attention mechanism. Finally, vehicle behavior decisions and possible human-centric interpretations are jointly generated based on the global fusion context. The experimental results across various settings on the public datasets demonstrate the superiority and effectiveness of our proposed solution.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"324 ","pages":"Article 113868"},"PeriodicalIF":7.6000,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125009141","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Intelligent autonomous driving systems must achieve accurate perception and driving decisions to enhance their effectiveness and adoption. Currently, driving behavior decisions have achieved high performance thanks to deep learning technology. However, most existing approaches lack interpretability, reducing user trust and hindering widespread adoption. While some efforts focus on transparency through strategies like heat maps, cost-volume, and auxiliary tasks, they often provide limited model interpretation or require additional annotations. In this paper, we present a novel unified framework to tackle these issues by integrating ego-vehicle behavior decisions with human-centric language-based interpretation prediction from ego-view visual input. First, we propose a self-supervised class-agnostic object Segmentor module based on Segment Anything Model and 2-D light adapter strategy, to capture the overall surrounding cues without any extra segmentation mask labels. Second, the semantic extractor is adopted to generate the hierarchical semantic-level cues. Subsequently, a fusion module is designed to generate the refined global features by incorporating the class-agnostic object features and semantic-level features using a self-attention mechanism. Finally, vehicle behavior decisions and possible human-centric interpretations are jointly generated based on the global fusion context. The experimental results across various settings on the public datasets demonstrate the superiority and effectiveness of our proposed solution.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.