Convolutional Neural Network or Vision Transformer? Benchmarking Various Machine Learning Models for Distracted Driver Detection

Hong Vin Koay, Joon Huang Chuah, C. Chow
{"title":"Convolutional Neural Network or Vision Transformer? Benchmarking Various Machine Learning Models for Distracted Driver Detection","authors":"Hong Vin Koay, Joon Huang Chuah, C. Chow","doi":"10.1109/TENCON54134.2021.9707341","DOIUrl":null,"url":null,"abstract":"Driver distraction is the main factor of severe traffic accidents and has become an essential issue in the traffic safety field. Hence, driver inattention systems are crucial in ensuring the safety of road users. With the introduction of Vision Transformer for computer vision tasks, there is a lack of comprehensive evaluation of various models for distracted driver detection. Hence, we raise the question - does vision transformers outperform convolutional neural networks (CNNs) in the field of detecting driving distraction? In this work, we evaluate and perform in-depth evaluations of various state-of-the-art CNN and Vision Transformer models to detect the distracted driver. We believe this will aid future researchers in this field in benchmarking their novel models with state-of-the-art models. We select ResNet, VGGNet, DenseNet, and EfficientNet as the candidates for CNN, while ViT, Swin Transformer, DeiT, and CaiT for Vision Transformer. We perform our benchmark on the American University of Cairo Distracted Driving Dataset (AUC-DDD) which consists of ten distracted classes. It is observed that CNN should be considered first if the downstream task is specific and the available dataset is small. An in-depth discussion and analysis are included in this work.","PeriodicalId":405859,"journal":{"name":"TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)","volume":"27 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON54134.2021.9707341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Driver distraction is the main factor of severe traffic accidents and has become an essential issue in the traffic safety field. Hence, driver inattention systems are crucial in ensuring the safety of road users. With the introduction of Vision Transformer for computer vision tasks, there is a lack of comprehensive evaluation of various models for distracted driver detection. Hence, we raise the question - does vision transformers outperform convolutional neural networks (CNNs) in the field of detecting driving distraction? In this work, we evaluate and perform in-depth evaluations of various state-of-the-art CNN and Vision Transformer models to detect the distracted driver. We believe this will aid future researchers in this field in benchmarking their novel models with state-of-the-art models. We select ResNet, VGGNet, DenseNet, and EfficientNet as the candidates for CNN, while ViT, Swin Transformer, DeiT, and CaiT for Vision Transformer. We perform our benchmark on the American University of Cairo Distracted Driving Dataset (AUC-DDD) which consists of ten distracted classes. It is observed that CNN should be considered first if the downstream task is specific and the available dataset is small. An in-depth discussion and analysis are included in this work.
卷积神经网络还是视觉变压器?对分心驾驶员检测的各种机器学习模型进行基准测试
驾驶员注意力分散是造成严重交通事故的主要因素,已成为交通安全领域的一个重要问题。因此,驾驶员注意力不集中系统对于确保道路使用者的安全至关重要。随着计算机视觉任务Vision Transformer的引入,缺乏对各种分心驾驶检测模型的综合评价。因此,我们提出了一个问题——视觉变压器在检测驾驶分心方面是否优于卷积神经网络(cnn) ?在这项工作中,我们对各种最先进的CNN和Vision Transformer模型进行了评估和深入评估,以检测分心的驾驶员。我们相信这将有助于该领域未来的研究人员用最先进的模型对他们的新模型进行基准测试。我们选择ResNet、VGGNet、DenseNet和EfficientNet作为CNN的候选,选择ViT、Swin Transformer、DeiT和CaiT作为Vision Transformer的候选。我们在美国开罗大学分心驾驶数据集(AUC-DDD)上执行基准测试,该数据集由十个分心类组成。可以观察到,如果下游任务特定且可用数据集较小,则应首先考虑CNN。本文进行了深入的讨论和分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信