{"title":"Vision Transformers for Road Accident Detection from Dashboard Cameras","authors":"Feten Hajri, H. Fradi","doi":"10.1109/AVSS56176.2022.9959545","DOIUrl":null,"url":null,"abstract":"Road accidents are increasing at a worrying rate and have raised one of the major concerns in traffic road monitoring. Their detection is becoming a very important aspect for intelligent traffic management systems. Unlike most of the existing anomaly detection systems that mainly monitor traffic status from static cameras, we focus in this paper on more challenging scenario using dashboard cameras. To handle this problem, we propose to adopt vision transformers with positional embeddings and based on multi-head attention mechanism for traffic monitoring following the increasing development of such models in natural language processing and computer vision communities. Precisely, to accomplish accident identification while exploiting the spatio-temporal aspect of videos, we employ a mix architecture. This architecture has the advantage of incorporating convolutional layers to capture local correlations of different patterns within the same image and vision transformer to learn the sequential correlations between the extracted features. Extensive experiments on two popular datasets DAD and CCD have been conducted to demonstrate the effectiveness of the proposed approach in terms of detection accuracy. The obtained results are compared to some recurrent neural networks commonly used to process sequential input data such as CNN-RNN, Conv-LSTM, and LCRN.","PeriodicalId":408581,"journal":{"name":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AVSS56176.2022.9959545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Road accidents are increasing at a worrying rate and have raised one of the major concerns in traffic road monitoring. Their detection is becoming a very important aspect for intelligent traffic management systems. Unlike most of the existing anomaly detection systems that mainly monitor traffic status from static cameras, we focus in this paper on more challenging scenario using dashboard cameras. To handle this problem, we propose to adopt vision transformers with positional embeddings and based on multi-head attention mechanism for traffic monitoring following the increasing development of such models in natural language processing and computer vision communities. Precisely, to accomplish accident identification while exploiting the spatio-temporal aspect of videos, we employ a mix architecture. This architecture has the advantage of incorporating convolutional layers to capture local correlations of different patterns within the same image and vision transformer to learn the sequential correlations between the extracted features. Extensive experiments on two popular datasets DAD and CCD have been conducted to demonstrate the effectiveness of the proposed approach in terms of detection accuracy. The obtained results are compared to some recurrent neural networks commonly used to process sequential input data such as CNN-RNN, Conv-LSTM, and LCRN.