Driver Gaze Zone Estimation Based on Three-Channel Convolution-Optimized Vision Transformer With Transfer Learning

IF 4.3 2区 综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Zhao Li;Siyang Jiang;Rui Fu;Yingshi Guo;Chang Wang
{"title":"Driver Gaze Zone Estimation Based on Three-Channel Convolution-Optimized Vision Transformer With Transfer Learning","authors":"Zhao Li;Siyang Jiang;Rui Fu;Yingshi Guo;Chang Wang","doi":"10.1109/JSEN.2024.3486373","DOIUrl":null,"url":null,"abstract":"Driver gaze zone estimation (DGZE) is essential for detecting the driver’s state and taking over rule-making in intelligent driving systems. However, convolutional neural network (CNN)-based multichannel models lack global feature extraction capability, with a large number of parameters and high computational complexity. Therefore, this article proposes a novel method that uses a three-channel convolution-optimized vision transformer (3C-CoViT) to estimate the driver’s gaze zone. The method replaces the linear projection in the pure ViT structure with convolutional projection, converts the input images of different channels into image sequences, and then adds a convolutional feed-forward network to extract the local features of the markers, enhance the correlation of adjacent tokens in spatial dimensions, and improve the performance and efficiency of the model. We then pretrained the model on the GazeCapture dataset based on transfer learning and then fine-tuned the model on the dataset built in the actual road experiment. To enhance the interpretability of the model, we presented a novel visualization method. Experimental results show that the proposed method can accurately identify driver gaze zones (98.04% average accuracy) and outperform state-of-the-art methods in terms of accuracy and reliability. Ablation studies proved the effectiveness of our proposed method over the pure ViT and the beneficial effects of transfer learning and three-channel information input.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"24 24","pages":"42064-42078"},"PeriodicalIF":4.3000,"publicationDate":"2024-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10740606/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Driver gaze zone estimation (DGZE) is essential for detecting the driver’s state and taking over rule-making in intelligent driving systems. However, convolutional neural network (CNN)-based multichannel models lack global feature extraction capability, with a large number of parameters and high computational complexity. Therefore, this article proposes a novel method that uses a three-channel convolution-optimized vision transformer (3C-CoViT) to estimate the driver’s gaze zone. The method replaces the linear projection in the pure ViT structure with convolutional projection, converts the input images of different channels into image sequences, and then adds a convolutional feed-forward network to extract the local features of the markers, enhance the correlation of adjacent tokens in spatial dimensions, and improve the performance and efficiency of the model. We then pretrained the model on the GazeCapture dataset based on transfer learning and then fine-tuned the model on the dataset built in the actual road experiment. To enhance the interpretability of the model, we presented a novel visualization method. Experimental results show that the proposed method can accurately identify driver gaze zones (98.04% average accuracy) and outperform state-of-the-art methods in terms of accuracy and reliability. Ablation studies proved the effectiveness of our proposed method over the pure ViT and the beneficial effects of transfer learning and three-channel information input.
基于三通道卷积优化视觉转换器和迁移学习的驾驶员注视区域估计
在智能驾驶系统中,驾驶员注视区域估计(DGZE)是检测驾驶员状态和接管规则制定的关键。然而,基于卷积神经网络(CNN)的多通道模型缺乏全局特征提取能力,参数数量多,计算复杂度高。因此,本文提出了一种利用三通道卷积优化视觉变压器(3C-CoViT)来估计驾驶员注视区域的新方法。该方法将纯ViT结构中的线性投影替换为卷积投影,将不同通道的输入图像转换为图像序列,然后加入卷积前馈网络提取标记的局部特征,增强相邻标记在空间维度上的相关性,提高模型的性能和效率。然后,我们基于迁移学习在GazeCapture数据集上预训练模型,然后在实际道路实验中构建的数据集上对模型进行微调。为了提高模型的可解释性,我们提出了一种新的可视化方法。实验结果表明,该方法能够准确识别驾驶员注视区域(平均准确率为98.04%),在准确率和可靠性方面均优于现有方法。消融研究证明了我们提出的方法比纯ViT的有效性,以及迁移学习和三通道信息输入的有益效果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Sensors Journal
IEEE Sensors Journal 工程技术-工程:电子与电气
CiteScore
7.70
自引率
14.00%
发文量
2058
审稿时长
5.2 months
期刊介绍: The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信