Supervised Contrastive Learning for Detecting Anomalous Driving Behaviours from Multimodal Videos

2022 19th Conference on Robots and Vision (CRV) Pub Date : 2021-09-09 DOI:10.1109/CRV55824.2022.00011

Shehroz S. Khan, Ziting Shen, Haoying Sun, Ax Patel, A. Abedi

{"title":"Supervised Contrastive Learning for Detecting Anomalous Driving Behaviours from Multimodal Videos","authors":"Shehroz S. Khan, Ziting Shen, Haoying Sun, Ax Patel, A. Abedi","doi":"10.1109/CRV55824.2022.00011","DOIUrl":null,"url":null,"abstract":"Distracted driving is one of the major reasons for vehicle accidents. Therefore, detecting distracted driving behaviours is of paramount importance to reduce the millions of deaths and injuries occurring worldwide. Distracted or anomalous driving behaviours are deviations from ‘normal’ driving that need to be identified correctly to alert the driver. However, these driving behaviours do not comprise one specific type of driving style and their distribution can be different during the training and test phases of a classifier. We formulate this problem as a supervised contrastive learning approach to learn a visual representation to detect normal, and seen and unseen anomalous driving behaviours. We made a change to the standard contrastive loss function to adjust the similarity of negative pairs to aid the optimization. Normally, in a (self) supervised contrastive framework, the projection head layers are omitted during the test phase as the encoding layers are considered to contain general visual representative information. However, we assert that for a video-based supervised contrastive learning task, including a projection head can be beneficial. We showed our results on a driver anomaly detection dataset that contains 783 minutes of video recordings of normal and anomalous driving behaviours of 31 drivers from various top and front cameras (both depth and infrared). We also performed an extra step of fine tuning the labels in this dataset. Out of 9 video modalities combinations, our proposed contrastive approach improved the ROC AUC on 6 in comparison to the baseline models (from 4.23% to 8.9 1% for different modalities). We performed statistical tests that showed evidence that our proposed method performs better than the baseline contrastive learning setup. Finally, the results showed that the fusion of depth and infrared modalities from top and front view achieved the best AUC ROC of 0.9738 and AUC PR of 0.9772.","PeriodicalId":131142,"journal":{"name":"2022 19th Conference on Robots and Vision (CRV)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 19th Conference on Robots and Vision (CRV)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CRV55824.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Distracted driving is one of the major reasons for vehicle accidents. Therefore, detecting distracted driving behaviours is of paramount importance to reduce the millions of deaths and injuries occurring worldwide. Distracted or anomalous driving behaviours are deviations from ‘normal’ driving that need to be identified correctly to alert the driver. However, these driving behaviours do not comprise one specific type of driving style and their distribution can be different during the training and test phases of a classifier. We formulate this problem as a supervised contrastive learning approach to learn a visual representation to detect normal, and seen and unseen anomalous driving behaviours. We made a change to the standard contrastive loss function to adjust the similarity of negative pairs to aid the optimization. Normally, in a (self) supervised contrastive framework, the projection head layers are omitted during the test phase as the encoding layers are considered to contain general visual representative information. However, we assert that for a video-based supervised contrastive learning task, including a projection head can be beneficial. We showed our results on a driver anomaly detection dataset that contains 783 minutes of video recordings of normal and anomalous driving behaviours of 31 drivers from various top and front cameras (both depth and infrared). We also performed an extra step of fine tuning the labels in this dataset. Out of 9 video modalities combinations, our proposed contrastive approach improved the ROC AUC on 6 in comparison to the baseline models (from 4.23% to 8.9 1% for different modalities). We performed statistical tests that showed evidence that our proposed method performs better than the baseline contrastive learning setup. Finally, the results showed that the fusion of depth and infrared modalities from top and front view achieved the best AUC ROC of 0.9738 and AUC PR of 0.9772.

查看原文本刊更多论文

基于监督对比学习的多模态视频异常驾驶行为检测

分心驾驶是交通事故的主要原因之一。因此，检测分心驾驶行为对于减少全世界数百万人的死亡和伤害至关重要。分心或异常驾驶行为是对“正常”驾驶的偏离，需要正确识别以提醒驾驶员。然而，这些驾驶行为并不包括一种特定类型的驾驶风格，它们的分布在分类器的训练和测试阶段可能是不同的。我们将这个问题表述为一种监督对比学习方法，以学习视觉表征来检测正常、可见和未见的异常驾驶行为。我们对标准对比损失函数进行了修改，调整了负对的相似度，以帮助优化。通常，在(自)监督对比框架中，由于编码层被认为包含一般的视觉代表性信息，因此在测试阶段忽略了投影头部层。然而，我们断言，对于基于视频的监督对比学习任务，包括投影头可能是有益的。我们在驾驶员异常检测数据集上展示了我们的结果，该数据集包含783分钟的视频记录，记录了31名驾驶员的正常和异常驾驶行为，这些视频记录来自各种顶部和前置摄像头(包括深度和红外)。我们还执行了一个额外的步骤来微调这个数据集中的标签。在9种视频模式组合中，与基线模型相比，我们提出的对比方法提高了6种的ROC AUC(不同模式的ROC AUC从4.23%提高到8.1%)。我们进行了统计测试，证明我们提出的方法比基线对比学习设置表现得更好。结果表明，俯视图和正视图深度模式与红外模式融合的AUC ROC为0.9738,AUC PR为0.9772。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 19th Conference on Robots and Vision (CRV)

自引率

0.00%

发文量