Contrastive Learning with Video Transformer for Driver Distraction Detection through Multiview and Multimodal Video

Hong Vin Koay, Joon Huang Chuah, C. Chow
{"title":"Contrastive Learning with Video Transformer for Driver Distraction Detection through Multiview and Multimodal Video","authors":"Hong Vin Koay, Joon Huang Chuah, C. Chow","doi":"10.1109/TENSYMP55890.2023.10223643","DOIUrl":null,"url":null,"abstract":"Distracted drivers are more likely to get involved in a fatal accident. Thus, detecting actions that may led to distraction should be prioritized to reduce road accidents. However, there are many actions that cause a driver to pivot his attention away from the road. Previous works on detecting distracted drivers are done through a defined set of actions that are considered as distraction. This type of dataset is known as ‘closed set’ since there are still many distraction actions that were not considered by the model. Being different from previous datasets and approaches, in this work, we utilize constructive learning to detect distractions through multiview and multimodal video. The dataset used is the Driver Anomaly Detection dataset. The model is tasked to identify normal and anomalous driving condition in an ‘open set’ manner, where there are unseen anomalous driving condition in the test set. We use Video Transformer as the backbone of the model and validate that the performance is better than convolutional-based backbone. Two views (front and top) of driving clips on two modalities (IR and depth) are used to train individual model. The results of different views and modalities are subsequently fused together. Our method achieves 0.9892 AUC and 97.02% accuracy with Swin-Tiny when considering both views and modalities.","PeriodicalId":314726,"journal":{"name":"2023 IEEE Region 10 Symposium (TENSYMP)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Region 10 Symposium (TENSYMP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENSYMP55890.2023.10223643","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Distracted drivers are more likely to get involved in a fatal accident. Thus, detecting actions that may led to distraction should be prioritized to reduce road accidents. However, there are many actions that cause a driver to pivot his attention away from the road. Previous works on detecting distracted drivers are done through a defined set of actions that are considered as distraction. This type of dataset is known as ‘closed set’ since there are still many distraction actions that were not considered by the model. Being different from previous datasets and approaches, in this work, we utilize constructive learning to detect distractions through multiview and multimodal video. The dataset used is the Driver Anomaly Detection dataset. The model is tasked to identify normal and anomalous driving condition in an ‘open set’ manner, where there are unseen anomalous driving condition in the test set. We use Video Transformer as the backbone of the model and validate that the performance is better than convolutional-based backbone. Two views (front and top) of driving clips on two modalities (IR and depth) are used to train individual model. The results of different views and modalities are subsequently fused together. Our method achieves 0.9892 AUC and 97.02% accuracy with Swin-Tiny when considering both views and modalities.
基于视频转换器的多视角多模式驾驶员分心检测对比学习
分心的司机更有可能卷入致命事故。因此,检测可能导致分心的行为应该优先考虑,以减少交通事故。然而,有许多行为会导致司机将注意力从道路上转移开。以前检测分心司机的工作是通过一组被定义为分心的动作来完成的。这种类型的数据集被称为“封闭集”,因为仍然有许多分散注意力的动作没有被模型考虑。与以前的数据集和方法不同,在这项工作中,我们利用建设性学习来通过多视图和多模态视频检测干扰。使用的数据集为驱动异常检测数据集。该模型的任务是以“开放集”的方式识别正常和异常驾驶条件,其中在测试集中存在未见的异常驾驶条件。我们使用Video Transformer作为模型的主干,并验证了其性能优于基于卷积的主干。两种模式(IR和depth)的驱动剪辑的两个视图(正面和顶部)用于训练单个模型。不同观点和模式的结果随后融合在一起。在考虑视图和模态的情况下,该方法的AUC为0.9892,准确率为97.02%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信