CNN-ViT: A multi-feature learning based approach for driver drowsiness detection

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS
Array Pub Date : 2025-06-24 DOI:10.1016/j.array.2025.100425
Madduri Venkateswarlu, Venkata Rami Reddy Chirra
{"title":"CNN-ViT: A multi-feature learning based approach for driver drowsiness detection","authors":"Madduri Venkateswarlu,&nbsp;Venkata Rami Reddy Chirra","doi":"10.1016/j.array.2025.100425","DOIUrl":null,"url":null,"abstract":"<div><div>Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100425"},"PeriodicalIF":4.5000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625000529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.
CNN-ViT:一种基于多特征学习的驾驶员困倦检测方法
司机困倦仍然是造成道路交通事故的一个重要因素,经常造成严重伤害和死亡。为了解决这个问题,本研究提出了一种先进的嗜睡检测系统,该系统将卷积神经网络(cnn)的能力(即DenseNet121, VGG16, VGG19和ResNet50)与视觉变压器(ViT)相结合。这个混合框架旨在利用cnn和transformer的互补优势:cnn擅长捕获细粒度的局部特征,而ViT有效地模拟图像中的全局依赖关系。通过两个分支同时处理输入图像,并合并提取的特征,用于将驾驶员的状态分为四类:Closed、Open、no_yawn或yawn。该系统在两个独立的数据集上进行了评估,分别命名为Dataset-1和Dataset-2。结果表明,ResNet50_ViT混合模型在Dataset-1上的准确率达到99.76%,而VGG19_ViT模型在Dataset-2上的准确率达到98.21%。使用准确性、精密度、f1分数和召回率等指标评估性能。在优化的超参数支持下,强有力的结果突出了混合模型用于实时驾驶员困倦检测的可靠性和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Array
Array Computer Science-General Computer Science
CiteScore
4.40
自引率
0.00%
发文量
93
审稿时长
45 days
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信