Convolutional Neural Network or Vision Transformer? Benchmarking Various Machine Learning Models for Distracted Driver Detection

TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON) Pub Date : 2021-12-07 DOI:10.1109/TENCON54134.2021.9707341

Hong Vin Koay, Joon Huang Chuah, C. Chow

{"title":"Convolutional Neural Network or Vision Transformer? Benchmarking Various Machine Learning Models for Distracted Driver Detection","authors":"Hong Vin Koay, Joon Huang Chuah, C. Chow","doi":"10.1109/TENCON54134.2021.9707341","DOIUrl":null,"url":null,"abstract":"Driver distraction is the main factor of severe traffic accidents and has become an essential issue in the traffic safety field. Hence, driver inattention systems are crucial in ensuring the safety of road users. With the introduction of Vision Transformer for computer vision tasks, there is a lack of comprehensive evaluation of various models for distracted driver detection. Hence, we raise the question - does vision transformers outperform convolutional neural networks (CNNs) in the field of detecting driving distraction? In this work, we evaluate and perform in-depth evaluations of various state-of-the-art CNN and Vision Transformer models to detect the distracted driver. We believe this will aid future researchers in this field in benchmarking their novel models with state-of-the-art models. We select ResNet, VGGNet, DenseNet, and EfficientNet as the candidates for CNN, while ViT, Swin Transformer, DeiT, and CaiT for Vision Transformer. We perform our benchmark on the American University of Cairo Distracted Driving Dataset (AUC-DDD) which consists of ten distracted classes. It is observed that CNN should be considered first if the downstream task is specific and the available dataset is small. An in-depth discussion and analysis are included in this work.","PeriodicalId":405859,"journal":{"name":"TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)","volume":"27 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON54134.2021.9707341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Driver distraction is the main factor of severe traffic accidents and has become an essential issue in the traffic safety field. Hence, driver inattention systems are crucial in ensuring the safety of road users. With the introduction of Vision Transformer for computer vision tasks, there is a lack of comprehensive evaluation of various models for distracted driver detection. Hence, we raise the question - does vision transformers outperform convolutional neural networks (CNNs) in the field of detecting driving distraction? In this work, we evaluate and perform in-depth evaluations of various state-of-the-art CNN and Vision Transformer models to detect the distracted driver. We believe this will aid future researchers in this field in benchmarking their novel models with state-of-the-art models. We select ResNet, VGGNet, DenseNet, and EfficientNet as the candidates for CNN, while ViT, Swin Transformer, DeiT, and CaiT for Vision Transformer. We perform our benchmark on the American University of Cairo Distracted Driving Dataset (AUC-DDD) which consists of ten distracted classes. It is observed that CNN should be considered first if the downstream task is specific and the available dataset is small. An in-depth discussion and analysis are included in this work.

查看原文本刊更多论文

卷积神经网络还是视觉变压器?对分心驾驶员检测的各种机器学习模型进行基准测试

驾驶员注意力分散是造成严重交通事故的主要因素，已成为交通安全领域的一个重要问题。因此，驾驶员注意力不集中系统对于确保道路使用者的安全至关重要。随着计算机视觉任务Vision Transformer的引入，缺乏对各种分心驾驶检测模型的综合评价。因此，我们提出了一个问题——视觉变压器在检测驾驶分心方面是否优于卷积神经网络(cnn) ?在这项工作中，我们对各种最先进的CNN和Vision Transformer模型进行了评估和深入评估，以检测分心的驾驶员。我们相信这将有助于该领域未来的研究人员用最先进的模型对他们的新模型进行基准测试。我们选择ResNet、VGGNet、DenseNet和EfficientNet作为CNN的候选，选择ViT、Swin Transformer、DeiT和CaiT作为Vision Transformer的候选。我们在美国开罗大学分心驾驶数据集(AUC-DDD)上执行基准测试，该数据集由十个分心类组成。可以观察到，如果下游任务特定且可用数据集较小，则应首先考虑CNN。本文进行了深入的讨论和分析。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)

自引率

0.00%

发文量