多模态-XAD：基于多模态环境描述的可解释自主驾驶

IF 7.9 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2024-10-07 DOI:10.1109/TITS.2024.3467175

Yuchao Feng;Zhen Feng;Wei Hua;Yuxiang Sun

{"title":"多模态-XAD：基于多模态环境描述的可解释自主驾驶","authors":"Yuchao Feng;Zhen Feng;Wei Hua;Yuxiang Sun","doi":"10.1109/TITS.2024.3467175","DOIUrl":null,"url":null,"abstract":"In recent years, deep learning-based end-to-end autonomous driving has become increasingly popular. However, deep neural networks are like black boxes. Their outputs are generally not explainable, making them not reliable to be used in real-world environments. To provide a solution to this problem, we propose an explainable deep neural network that jointly predicts driving actions and multimodal environment descriptions of traffic scenes, including bird-eye-view (BEV) maps and natural-language environment descriptions. In this network, both the context information from BEV perception and the local information from semantic perception are considered before producing the driving actions and natural-language environment descriptions. To evaluate our network, we build a new dataset with hand-labelled ground truth for driving actions and multimodal environment descriptions. Experimental results show that the combination of context information and local information enhances the prediction performance of driving action and environment description, thereby improving the safety and explainability of our end-to-end autonomous driving network.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"25 12","pages":"19469-19481"},"PeriodicalIF":7.9000,"publicationDate":"2024-10-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multimodal-XAD: Explainable Autonomous Driving Based on Multimodal Environment Descriptions\",\"authors\":\"Yuchao Feng;Zhen Feng;Wei Hua;Yuxiang Sun\",\"doi\":\"10.1109/TITS.2024.3467175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, deep learning-based end-to-end autonomous driving has become increasingly popular. However, deep neural networks are like black boxes. Their outputs are generally not explainable, making them not reliable to be used in real-world environments. To provide a solution to this problem, we propose an explainable deep neural network that jointly predicts driving actions and multimodal environment descriptions of traffic scenes, including bird-eye-view (BEV) maps and natural-language environment descriptions. In this network, both the context information from BEV perception and the local information from semantic perception are considered before producing the driving actions and natural-language environment descriptions. To evaluate our network, we build a new dataset with hand-labelled ground truth for driving actions and multimodal environment descriptions. Experimental results show that the combination of context information and local information enhances the prediction performance of driving action and environment description, thereby improving the safety and explainability of our end-to-end autonomous driving network.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"25 12\",\"pages\":\"19469-19481\"},\"PeriodicalIF\":7.9000,\"publicationDate\":\"2024-10-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10706985/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10706985/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

近年来，基于深度学习的端到端自动驾驶越来越受欢迎。然而，深度神经网络就像黑盒子。它们的输出通常无法解释，因此在实际环境中使用并不可靠。为了解决这个问题，我们提出了一种可解释的深度神经网络，它可以联合预测驾驶行为和交通场景的多模态环境描述，包括鸟瞰（BEV）地图和自然语言环境描述。在该网络中，在生成驾驶动作和自然语言环境描述之前，会同时考虑来自 BEV 感知的上下文信息和来自语义感知的本地信息。为了对我们的网络进行评估，我们建立了一个新的数据集，该数据集包含手动标注的驾驶动作和多模态环境描述的地面实况。实验结果表明，上下文信息和本地信息的结合提高了驾驶动作和环境描述的预测性能，从而改善了端到端自动驾驶网络的安全性和可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multimodal-XAD: Explainable Autonomous Driving Based on Multimodal Environment Descriptions

In recent years, deep learning-based end-to-end autonomous driving has become increasingly popular. However, deep neural networks are like black boxes. Their outputs are generally not explainable, making them not reliable to be used in real-world environments. To provide a solution to this problem, we propose an explainable deep neural network that jointly predicts driving actions and multimodal environment descriptions of traffic scenes, including bird-eye-view (BEV) maps and natural-language environment descriptions. In this network, both the context information from BEV perception and the local information from semantic perception are considered before producing the driving actions and natural-language environment descriptions. To evaluate our network, we build a new dataset with hand-labelled ground truth for driving actions and multimodal environment descriptions. Experimental results show that the combination of context information and local information enhances the prediction performance of driving action and environment description, thereby improving the safety and explainability of our end-to-end autonomous driving network.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.