{"title":"Convolutional Neural Network or Vision Transformer? Benchmarking Various Machine Learning Models for Distracted Driver Detection","authors":"Hong Vin Koay, Joon Huang Chuah, C. Chow","doi":"10.1109/TENCON54134.2021.9707341","DOIUrl":null,"url":null,"abstract":"Driver distraction is the main factor of severe traffic accidents and has become an essential issue in the traffic safety field. Hence, driver inattention systems are crucial in ensuring the safety of road users. With the introduction of Vision Transformer for computer vision tasks, there is a lack of comprehensive evaluation of various models for distracted driver detection. Hence, we raise the question - does vision transformers outperform convolutional neural networks (CNNs) in the field of detecting driving distraction? In this work, we evaluate and perform in-depth evaluations of various state-of-the-art CNN and Vision Transformer models to detect the distracted driver. We believe this will aid future researchers in this field in benchmarking their novel models with state-of-the-art models. We select ResNet, VGGNet, DenseNet, and EfficientNet as the candidates for CNN, while ViT, Swin Transformer, DeiT, and CaiT for Vision Transformer. We perform our benchmark on the American University of Cairo Distracted Driving Dataset (AUC-DDD) which consists of ten distracted classes. It is observed that CNN should be considered first if the downstream task is specific and the available dataset is small. An in-depth discussion and analysis are included in this work.","PeriodicalId":405859,"journal":{"name":"TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)","volume":"27 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TENCON54134.2021.9707341","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Driver distraction is the main factor of severe traffic accidents and has become an essential issue in the traffic safety field. Hence, driver inattention systems are crucial in ensuring the safety of road users. With the introduction of Vision Transformer for computer vision tasks, there is a lack of comprehensive evaluation of various models for distracted driver detection. Hence, we raise the question - does vision transformers outperform convolutional neural networks (CNNs) in the field of detecting driving distraction? In this work, we evaluate and perform in-depth evaluations of various state-of-the-art CNN and Vision Transformer models to detect the distracted driver. We believe this will aid future researchers in this field in benchmarking their novel models with state-of-the-art models. We select ResNet, VGGNet, DenseNet, and EfficientNet as the candidates for CNN, while ViT, Swin Transformer, DeiT, and CaiT for Vision Transformer. We perform our benchmark on the American University of Cairo Distracted Driving Dataset (AUC-DDD) which consists of ten distracted classes. It is observed that CNN should be considered first if the downstream task is specific and the available dataset is small. An in-depth discussion and analysis are included in this work.