{"title":"CNN-ViT:一种基于多特征学习的驾驶员困倦检测方法","authors":"Madduri Venkateswarlu, Venkata Rami Reddy Chirra","doi":"10.1016/j.array.2025.100425","DOIUrl":null,"url":null,"abstract":"<div><div>Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.</div></div>","PeriodicalId":8417,"journal":{"name":"Array","volume":"27 ","pages":"Article 100425"},"PeriodicalIF":4.5000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CNN-ViT: A multi-feature learning based approach for driver drowsiness detection\",\"authors\":\"Madduri Venkateswarlu, Venkata Rami Reddy Chirra\",\"doi\":\"10.1016/j.array.2025.100425\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.</div></div>\",\"PeriodicalId\":8417,\"journal\":{\"name\":\"Array\",\"volume\":\"27 \",\"pages\":\"Article 100425\"},\"PeriodicalIF\":4.5000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Array\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2590005625000529\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Array","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2590005625000529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
CNN-ViT: A multi-feature learning based approach for driver drowsiness detection
Driver drowsiness remains a critical contributor to road accidents, frequently resulting in severe injuries and fatalities. To address this issue, the present study proposes an advanced drowsiness detection system that combines the competencies of Convolutional Neural Networks (CNNs) — namely DenseNet121, VGG16, VGG19, and ResNet50 — with a Vision Transformer (ViT). This hybrid framework is designed to harness the complementary strengths of CNNs and transformers: CNNs excel at capturing fine-grained local features, while ViT effectively models global dependencies within images. The input images are processed simultaneously through both branches, and their extracted features are merged and used to classify the driver’s state into one of four categories: Closed, Open, no_yawn, or yawn. The proposed system was evaluated on two separate datasets, named Dataset-1 and Dataset-2. Results demonstrated that the ResNet50_ViT hybrid attained a high accuracy of 99.76% on Dataset-1, while the VGG19_ViT model attained 98.21% on Dataset-2. Performance was assessed using metrics such as accuracy, precision, F1-score, and recall. The strong results, supported by optimized hyperparameters, highlight the reliability and effectiveness of the hybrid model for real-time driver drowsiness detection.