UnMoDE：基于特征解纠缠的驾驶员注视估计的不确定性建模

IF 8.4 1区工程技术 Q1 ENGINEERING, CIVIL

IEEE Transactions on Intelligent Transportation Systems Pub Date : 2025-04-14 DOI:10.1109/TITS.2025.3556553

Daosong Hu;Mingyue Cui;Kai Huang

{"title":"UnMoDE：基于特征解纠缠的驾驶员注视估计的不确定性建模","authors":"Daosong Hu;Mingyue Cui;Kai Huang","doi":"10.1109/TITS.2025.3556553","DOIUrl":null,"url":null,"abstract":"Gaze estimation can be used for assessing the attention level of drivers. Current works predominantly focus on enhancing model accuracy, often overlooking the influence of input sample and label uncertainty. In this paper, we propose a framework for uncertainty modeling in driver gaze estimation via feature disentanglement, referred to as UnMoDE. Our approach begins by extracting facial information into distinct feature spaces using an asymmetric dual-branch encoder to obtain gaze features. Subsequently, a multi-layer perceptron (MLP) is employed to project gaze features and labels into an embedding space, representing them as Gaussian distributions. The uncertainty is described using a covariance matrix. Random sampling is applied to derive samples from the gaze embedding distribution to estimate the most probable embedding representation. This estimated representation is then used to regress the gaze direction and is projected back into the gaze feature space, along with identity information, to facilitate facial reconstruction. Extensive experimental evaluations demonstrate that UnMoDE significantly outperforms baseline and state-of-the-art methods on the latest benchmark datasets collected for drivers, particularly in reducing the number of samples with significant errors.","PeriodicalId":13416,"journal":{"name":"IEEE Transactions on Intelligent Transportation Systems","volume":"26 7","pages":"10612-10622"},"PeriodicalIF":8.4000,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"UnMoDE: Uncertainty Modeling for Driver Gaze Estimation via Feature Disentanglement\",\"authors\":\"Daosong Hu;Mingyue Cui;Kai Huang\",\"doi\":\"10.1109/TITS.2025.3556553\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gaze estimation can be used for assessing the attention level of drivers. Current works predominantly focus on enhancing model accuracy, often overlooking the influence of input sample and label uncertainty. In this paper, we propose a framework for uncertainty modeling in driver gaze estimation via feature disentanglement, referred to as UnMoDE. Our approach begins by extracting facial information into distinct feature spaces using an asymmetric dual-branch encoder to obtain gaze features. Subsequently, a multi-layer perceptron (MLP) is employed to project gaze features and labels into an embedding space, representing them as Gaussian distributions. The uncertainty is described using a covariance matrix. Random sampling is applied to derive samples from the gaze embedding distribution to estimate the most probable embedding representation. This estimated representation is then used to regress the gaze direction and is projected back into the gaze feature space, along with identity information, to facilitate facial reconstruction. Extensive experimental evaluations demonstrate that UnMoDE significantly outperforms baseline and state-of-the-art methods on the latest benchmark datasets collected for drivers, particularly in reducing the number of samples with significant errors.\",\"PeriodicalId\":13416,\"journal\":{\"name\":\"IEEE Transactions on Intelligent Transportation Systems\",\"volume\":\"26 7\",\"pages\":\"10612-10622\"},\"PeriodicalIF\":8.4000,\"publicationDate\":\"2025-04-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Intelligent Transportation Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10964529/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, CIVIL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Intelligent Transportation Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10964529/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, CIVIL","Score":null,"Total":0}

引用次数: 0

摘要

凝视估计可用于评估驾驶员的注意力水平。目前的工作主要集中在提高模型的准确性，往往忽略了输入样本和标签不确定性的影响。在本文中，我们提出了一种基于特征解纠缠的不确定性建模框架，称为UnMoDE。我们的方法首先使用非对称双分支编码器将面部信息提取到不同的特征空间中以获得凝视特征。随后，使用多层感知器（MLP）将注视特征和标签投影到嵌入空间中，并将其表示为高斯分布。不确定性用协方差矩阵来描述。采用随机抽样方法从凝视嵌入分布中提取样本，以估计最可能的嵌入表示。然后使用该估计的表示来回归凝视方向，并与身份信息一起投影回凝视特征空间，以促进面部重建。大量的实验评估表明，在为驾驶员收集的最新基准数据集上，UnMoDE显著优于基线和最先进的方法，特别是在减少具有重大错误的样本数量方面。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

UnMoDE: Uncertainty Modeling for Driver Gaze Estimation via Feature Disentanglement

Gaze estimation can be used for assessing the attention level of drivers. Current works predominantly focus on enhancing model accuracy, often overlooking the influence of input sample and label uncertainty. In this paper, we propose a framework for uncertainty modeling in driver gaze estimation via feature disentanglement, referred to as UnMoDE. Our approach begins by extracting facial information into distinct feature spaces using an asymmetric dual-branch encoder to obtain gaze features. Subsequently, a multi-layer perceptron (MLP) is employed to project gaze features and labels into an embedding space, representing them as Gaussian distributions. The uncertainty is described using a covariance matrix. Random sampling is applied to derive samples from the gaze embedding distribution to estimate the most probable embedding representation. This estimated representation is then used to regress the gaze direction and is projected back into the gaze feature space, along with identity information, to facilitate facial reconstruction. Extensive experimental evaluations demonstrate that UnMoDE significantly outperforms baseline and state-of-the-art methods on the latest benchmark datasets collected for drivers, particularly in reducing the number of samples with significant errors.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Intelligent Transportation Systems 工程技术-工程：电子与电气

CiteScore

14.80

自引率

12.90%

发文量

1872

审稿时长

7.5 months

期刊介绍： The theoretical, experimental and operational aspects of electrical and electronics engineering and information technologies as applied to Intelligent Transportation Systems (ITS). Intelligent Transportation Systems are defined as those systems utilizing synergistic technologies and systems engineering concepts to develop and improve transportation systems of all kinds. The scope of this interdisciplinary activity includes the promotion, consolidation and coordination of ITS technical activities among IEEE entities, and providing a focus for cooperative activities, both internally and externally.