VRSTNN：用于自动驾驶系统危险事件早期检测的视觉关联时空神经网络

IF 14 1区工程技术 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Intelligent Vehicles Pub Date : 2024-04-23 DOI:10.1109/TIV.2024.3392589

Dannier Xiao;Mehrdad Dianati;Paul Jennings;Roger Woodman

{"title":"VRSTNN：用于自动驾驶系统危险事件早期检测的视觉关联时空神经网络","authors":"Dannier Xiao;Mehrdad Dianati;Paul Jennings;Roger Woodman","doi":"10.1109/TIV.2024.3392589","DOIUrl":null,"url":null,"abstract":"Reliable and early detection of hazardous events is vital for the safe deployment of automated driving systems. Yet, it remains challenging as road environments can be highly complex and dynamic. State-of-the-art solutions utilise neural networks to learn visual features and temporal patterns from collision videos. However, in this paper, we show how visual features alone may not provide the essential context needed to detect early warning patterns. To address these limitations, we first propose an input encoding that captures the context of the scene. This is achieved by formulating a scene as a graph to provide a framework to represent the arrangement, relationships and behaviours of each road user. We then process the graphs using graph neural networks to identify scene context from: 1) the collective behaviour of nearby road users based on their relationships and 2) local node features that describe individual behaviour. We then propose a novel visual-relational spatio-temporal neural network (VRSTNN) that leverages multi-modal processing to understand scene context and fuse it with the visual characteristics of the scene for more reliable and early hazard detection. Our results show that our VRSTNN outperforms state-of-the-art models in terms of accuracy, F1 and false negative rate on a real and synthetic benchmark dataset: DOTA and GTAC.","PeriodicalId":36532,"journal":{"name":"IEEE Transactions on Intelligent Vehicles","volume":"9 11","pages":"7016-7029"},"PeriodicalIF":14.0000,"publicationDate":"2024-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"VRSTNN: Visual-Relational Spatio-Temporal Neural Network for Early Hazardous Event Detection in Automated Driving Systems\",\"authors\":\"Dannier Xiao;Mehrdad Dianati;Paul Jennings;Roger Woodman\",\"doi\":\"10.1109/TIV.2024.3392589\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reliable and early detection of hazardous events is vital for the safe deployment of automated driving systems. Yet, it remains challenging as road environments can be highly complex and dynamic. State-of-the-art solutions utilise neural networks to learn visual features and temporal patterns from collision videos. However, in this paper, we show how visual features alone may not provide the essential context needed to detect early warning patterns. To address these limitations, we first propose an input encoding that captures the context of the scene. This is achieved by formulating a scene as a graph to provide a framework to represent the arrangement, relationships and behaviours of each road user. We then process the graphs using graph neural networks to identify scene context from: 1) the collective behaviour of nearby road users based on their relationships and 2) local node features that describe individual behaviour. We then propose a novel visual-relational spatio-temporal neural network (VRSTNN) that leverages multi-modal processing to understand scene context and fuse it with the visual characteristics of the scene for more reliable and early hazard detection. Our results show that our VRSTNN outperforms state-of-the-art models in terms of accuracy, F1 and false negative rate on a real and synthetic benchmark dataset: DOTA and GTAC.\",\"PeriodicalId\":36532,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Vehicles\",\"volume\":\"9 11\",\"pages\":\"7016-7029\"},\"PeriodicalIF\":14.0000,\"publicationDate\":\"2024-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Vehicles\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10507041/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Vehicles","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10507041/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

对危险事件的可靠和早期检测对于自动驾驶系统的安全部署至关重要。然而，由于道路环境可能高度复杂和动态，这仍然具有挑战性。最先进的解决方案利用神经网络从碰撞视频中学习视觉特征和时间模式。然而，在本文中，我们展示了单独的视觉特征可能无法提供检测早期预警模式所需的基本背景。为了解决这些限制，我们首先提出了一种捕捉场景上下文的输入编码。这是通过将场景形成图形来提供一个框架来表示每个道路使用者的安排、关系和行为来实现的。然后，我们使用图神经网络处理图，从以下方面识别场景上下文：1)基于其关系的附近道路使用者的集体行为；2)描述个体行为的局部节点特征。然后，我们提出了一种新的视觉关系时空神经网络（VRSTNN），它利用多模态处理来理解场景上下文，并将其与场景的视觉特征融合在一起，以实现更可靠和早期的危险检测。我们的结果表明，在真实和合成基准数据集（DOTA和GTAC）上，我们的VRSTNN在准确率、F1和假阴性率方面优于最先进的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

VRSTNN: Visual-Relational Spatio-Temporal Neural Network for Early Hazardous Event Detection in Automated Driving Systems

Reliable and early detection of hazardous events is vital for the safe deployment of automated driving systems. Yet, it remains challenging as road environments can be highly complex and dynamic. State-of-the-art solutions utilise neural networks to learn visual features and temporal patterns from collision videos. However, in this paper, we show how visual features alone may not provide the essential context needed to detect early warning patterns. To address these limitations, we first propose an input encoding that captures the context of the scene. This is achieved by formulating a scene as a graph to provide a framework to represent the arrangement, relationships and behaviours of each road user. We then process the graphs using graph neural networks to identify scene context from: 1) the collective behaviour of nearby road users based on their relationships and 2) local node features that describe individual behaviour. We then propose a novel visual-relational spatio-temporal neural network (VRSTNN) that leverages multi-modal processing to understand scene context and fuse it with the visual characteristics of the scene for more reliable and early hazard detection. Our results show that our VRSTNN outperforms state-of-the-art models in terms of accuracy, F1 and false negative rate on a real and synthetic benchmark dataset: DOTA and GTAC.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Vehicles Mathematics-Control and Optimization

CiteScore

12.10

自引率

13.40%

发文量

177

期刊介绍： The IEEE Transactions on Intelligent Vehicles (T-IV) is a premier platform for publishing peer-reviewed articles that present innovative research concepts, application results, significant theoretical findings, and application case studies in the field of intelligent vehicles. With a particular emphasis on automated vehicles within roadway environments, T-IV aims to raise awareness of pressing research and application challenges. Our focus is on providing critical information to the intelligent vehicle community, serving as a dissemination vehicle for IEEE ITS Society members and others interested in learning about the state-of-the-art developments and progress in research and applications related to intelligent vehicles. Join us in advancing knowledge and innovation in this dynamic field.