Dongjie Liu , Dawei Li , Hongliang Ding , Yang Cao , Kun Gao
{"title":"Beyond vision: A unified transformer with bidirectional attention for predicting driver perceived risk from multi-modal data","authors":"Dongjie Liu , Dawei Li , Hongliang Ding , Yang Cao , Kun Gao","doi":"10.1016/j.trc.2025.105270","DOIUrl":null,"url":null,"abstract":"<div><div>Modeling driver perceived risk (or subjective risk) plays a critical role in improving driving safety, as different drivers often perceive varying levels of risk under identical conditions, prompting adjustments in their driving behavior. Driving is a complex activity involving multiple cognitive and perceptual processes, such as visual information, driver feedback, vehicle dynamics, and traffic and environmental conditions. However, existing models for subjective risk perception have yet to fully address the need for integrating multi-modal data. To address this gap, we present a Transformer-based model aimed at processing multimodal inputs in a unified manner to enhance the prediction of subjective risk perception. Unlike existing methodologies that extract features specific to each modality, it employs embedding layers to transform images, unstructured, and structured fields into visual and text tokens. Subsequently, bi-directional multimodal attention blocks with inter-modal and intra-modal attention mechanisms capture comprehensive representations of traffic scene images, unstructured traffic scene descriptions, structured traffic data, environmental statistics, and demographics. Experimental results show that the proposed unified model achieves superior predictive performance over existing benchmarks while maintaining reasonable interpretability. Furthermore, the model is generalizable, making it applicable to various multi-modal prediction tasks across different transportation contexts.</div></div>","PeriodicalId":54417,"journal":{"name":"Transportation Research Part C-Emerging Technologies","volume":"179 ","pages":"Article 105270"},"PeriodicalIF":7.6000,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Transportation Research Part C-Emerging Technologies","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0968090X25002748","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"TRANSPORTATION SCIENCE & TECHNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Modeling driver perceived risk (or subjective risk) plays a critical role in improving driving safety, as different drivers often perceive varying levels of risk under identical conditions, prompting adjustments in their driving behavior. Driving is a complex activity involving multiple cognitive and perceptual processes, such as visual information, driver feedback, vehicle dynamics, and traffic and environmental conditions. However, existing models for subjective risk perception have yet to fully address the need for integrating multi-modal data. To address this gap, we present a Transformer-based model aimed at processing multimodal inputs in a unified manner to enhance the prediction of subjective risk perception. Unlike existing methodologies that extract features specific to each modality, it employs embedding layers to transform images, unstructured, and structured fields into visual and text tokens. Subsequently, bi-directional multimodal attention blocks with inter-modal and intra-modal attention mechanisms capture comprehensive representations of traffic scene images, unstructured traffic scene descriptions, structured traffic data, environmental statistics, and demographics. Experimental results show that the proposed unified model achieves superior predictive performance over existing benchmarks while maintaining reasonable interpretability. Furthermore, the model is generalizable, making it applicable to various multi-modal prediction tasks across different transportation contexts.
期刊介绍:
Transportation Research: Part C (TR_C) is dedicated to showcasing high-quality, scholarly research that delves into the development, applications, and implications of transportation systems and emerging technologies. Our focus lies not solely on individual technologies, but rather on their broader implications for the planning, design, operation, control, maintenance, and rehabilitation of transportation systems, services, and components. In essence, the intellectual core of the journal revolves around the transportation aspect rather than the technology itself. We actively encourage the integration of quantitative methods from diverse fields such as operations research, control systems, complex networks, computer science, and artificial intelligence. Join us in exploring the intersection of transportation systems and emerging technologies to drive innovation and progress in the field.