End-to-End Spatio-Temporal Attention-Based Lane-Change Intention Prediction from Multi-Perspective Cameras

2023 IEEE Intelligent Vehicles Symposium (IV) Pub Date : 2023-06-04 DOI:10.1109/IV55152.2023.10186602

Zhouqiao Zhao, Zhensong Wei, Danyang Tian, B. Reimer, Pnina Gershon, Ehsan Moradi-Pari

{"title":"End-to-End Spatio-Temporal Attention-Based Lane-Change Intention Prediction from Multi-Perspective Cameras","authors":"Zhouqiao Zhao, Zhensong Wei, Danyang Tian, B. Reimer, Pnina Gershon, Ehsan Moradi-Pari","doi":"10.1109/IV55152.2023.10186602","DOIUrl":null,"url":null,"abstract":"Advanced Driver Assistance Systems (ADAS) with proactive alerts have been used to increase driving safety. Such systems’ performance greatly depends on how accurately and quickly the risky situations and maneuvers are detected. Existing ADAS provide warnings based on the vehicle’s operational status, detection of environments, and the drivers’ overt actions (e.g., using turn signals or steering wheels), which may not give drivers as much as optimal time to react. In this paper, we proposed a spatio-temporal attention-based neural network to predict drivers’ lane-change intention by fusing the videos from both in-cabin and forward perspectives. The Convolutional Neural Network (CNN)-Recursive Neural Network (RNN) network architecture was leveraged to extract both the spatial and temporal information. On top of this network backbone structure, the feature maps from different time steps and perspectives were fused using multi-head self-attention at each resolution of the CNN. The proposed model was trained and evaluated using a processed subset of the MIT Advanced Vehicle Technology (MIT-AVT) dataset which contains synchronized CAN data, 11058-second videos from 3 different views, 548 lane-change events, and 274 non-lane-change events performed by 83 drivers. The results demonstrate that the model achieves 87% F1-score within the 1-second validation window and 70% F1-score within the 5-second validation window with real-time performance.","PeriodicalId":195148,"journal":{"name":"2023 IEEE Intelligent Vehicles Symposium (IV)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Intelligent Vehicles Symposium (IV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IV55152.2023.10186602","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Advanced Driver Assistance Systems (ADAS) with proactive alerts have been used to increase driving safety. Such systems’ performance greatly depends on how accurately and quickly the risky situations and maneuvers are detected. Existing ADAS provide warnings based on the vehicle’s operational status, detection of environments, and the drivers’ overt actions (e.g., using turn signals or steering wheels), which may not give drivers as much as optimal time to react. In this paper, we proposed a spatio-temporal attention-based neural network to predict drivers’ lane-change intention by fusing the videos from both in-cabin and forward perspectives. The Convolutional Neural Network (CNN)-Recursive Neural Network (RNN) network architecture was leveraged to extract both the spatial and temporal information. On top of this network backbone structure, the feature maps from different time steps and perspectives were fused using multi-head self-attention at each resolution of the CNN. The proposed model was trained and evaluated using a processed subset of the MIT Advanced Vehicle Technology (MIT-AVT) dataset which contains synchronized CAN data, 11058-second videos from 3 different views, 548 lane-change events, and 274 non-lane-change events performed by 83 drivers. The results demonstrate that the model achieves 87% F1-score within the 1-second validation window and 70% F1-score within the 5-second validation window with real-time performance.

查看原文本刊更多论文

基于多视角相机端到端时空注意力的变道意图预测

具有主动警报功能的高级驾驶辅助系统(ADAS)已被用于提高驾驶安全性。这类系统的性能在很大程度上取决于探测到危险情况和机动的准确性和快速性。现有的ADAS根据车辆的运行状态、对环境的检测以及驾驶员的公开行为(例如，使用转向灯或方向盘)提供警告，这可能无法给驾驶员提供最佳的反应时间。在本文中，我们提出了一种基于时空注意力的神经网络，通过融合车内和前方的视频来预测驾驶员的变道意图。利用卷积神经网络(CNN)-递归神经网络(RNN)网络架构提取空间和时间信息。在该网络骨干网结构的基础上，利用CNN各分辨率的多头自关注，融合不同时间步长、不同视角的特征图。该模型使用麻省理工学院先进车辆技术(MIT- avt)数据集的一个处理子集进行训练和评估，该数据集包含同步的CAN数据、来自3个不同视角的11058秒视频、548个变道事件和由83名驾驶员执行的274个非变道事件。结果表明，该模型在1秒验证窗口内达到87%的f1得分，在5秒验证窗口内达到70%的f1得分，具有实时性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE Intelligent Vehicles Symposium (IV)

自引率

0.00%

发文量