CNN-ViT：一种基于多特征学习的驾驶员困倦检测方法

IF 4.5 Q2 COMPUTER SCIENCE, THEORY & METHODS

Array Pub Date : 2025-06-24 DOI:10.1016/j.array.2025.100425

Madduri Venkateswarlu, Venkata Rami Reddy Chirra

{"title":"CNN-ViT：一种基于多特征学习的驾驶员困倦检测方法","authors":"Madduri Venkateswarlu, Venkata Rami Reddy Chirra","doi":"10.1016/j.array.2025.100425","DOIUrl":null,"url":null,"abstract":"<div><div>Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100425"},"PeriodicalIF":4.5000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CNN-ViT: A multi-feature learning based approach for driver drowsiness detection\",\"authors\":\"Madduri Venkateswarlu, Venkata Rami Reddy Chirra\",\"doi\":\"10.1016/j.array.2025.100425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100425\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625000529\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625000529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

司机困倦仍然是造成道路交通事故的一个重要因素，经常造成严重伤害和死亡。为了解决这个问题，本研究提出了一种先进的嗜睡检测系统，该系统将卷积神经网络（cnn）的能力（即DenseNet121， VGG16， VGG19和ResNet50）与视觉变压器（ViT）相结合。这个混合框架旨在利用cnn和transformer的互补优势：cnn擅长捕获细粒度的局部特征，而ViT有效地模拟图像中的全局依赖关系。通过两个分支同时处理输入图像，并合并提取的特征，用于将驾驶员的状态分为四类：Closed、Open、no_yawn或yawn。该系统在两个独立的数据集上进行了评估，分别命名为Dataset-1和Dataset-2。结果表明，ResNet50_ViT混合模型在Dataset-1上的准确率达到99.76%，而VGG19_ViT模型在Dataset-2上的准确率达到98.21%。使用准确性、精密度、f1分数和召回率等指标评估性能。在优化的超参数支持下，强有力的结果突出了混合模型用于实时驾驶员困倦检测的可靠性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

CNN-ViT: A multi-feature learning based approach for driver drowsiness detection

Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Array Computer Science-General Computer Science

CiteScore

4.40

自引率

0.00%

发文量

审稿时长

45 days