“脱离与整合”：凝视估计的个性化因果网络

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-06-06 DOI:10.1109/TIP.2025.3575238

Yi Tian;Xiyun Wang;Sihui Zhang;Wanru Xu;Yi Jin;Yaping Huang

{"title":"“脱离与整合”：凝视估计的个性化因果网络","authors":"Yi Tian;Xiyun Wang;Sihui Zhang;Wanru Xu;Yi Jin;Yaping Huang","doi":"10.1109/TIP.2025.3575238","DOIUrl":null,"url":null,"abstract":"Gaze estimation task aims to predict a 3D gaze direction or a 2D gaze point given a face or eye image. To improve generalization of gaze estimation models to unseen new users, existing methods either disentangle personalized information of all subjects from their gaze features, or integrate unrefined personalized information into blended embeddings. Their methodologies are not rigorous whose performance is still unsatisfactory. In this paper, we put forward a comprehensive perspective named ‘Disengage AND Integrate’ to deal with personalized information, which elaborates that for specified users, their irrelevant personalized information should be discarded while relevant one should be considered. Accordingly, a novel Personalized Causal Network (PCNet) for generalizable gaze estimation has been proposed. The PCNet adopts a two-branch framework, which consists of a subject-deconfounded appearance sub-network (SdeANet) and a prototypical personalization sub-network (ProPNet). The SdeANet aims to explore causalities among facial images, gazes, and personalized information and extract a subject-invariant appearance-aware feature of each image by means of causal intervention. The ProPNet aims to characterize customized personalization-aware features of arbitrary users with the help of a prototype-based subject identification task. Furthermore, our whole PCNet is optimized in a hybrid episodic training paradigm, which further improve its adaptability to new users. Experiments on three challenging datasets over within-domain and cross-domain gaze estimation tasks demonstrate the effectiveness of our method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"3733-3747"},"PeriodicalIF":13.7000,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"‘Disengage AND Integrate’: Personalized Causal Network for Gaze Estimation\",\"authors\":\"Yi Tian;Xiyun Wang;Sihui Zhang;Wanru Xu;Yi Jin;Yaping Huang\",\"doi\":\"10.1109/TIP.2025.3575238\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Gaze estimation task aims to predict a 3D gaze direction or a 2D gaze point given a face or eye image. To improve generalization of gaze estimation models to unseen new users, existing methods either disentangle personalized information of all subjects from their gaze features, or integrate unrefined personalized information into blended embeddings. Their methodologies are not rigorous whose performance is still unsatisfactory. In this paper, we put forward a comprehensive perspective named ‘Disengage AND Integrate’ to deal with personalized information, which elaborates that for specified users, their irrelevant personalized information should be discarded while relevant one should be considered. Accordingly, a novel Personalized Causal Network (PCNet) for generalizable gaze estimation has been proposed. The PCNet adopts a two-branch framework, which consists of a subject-deconfounded appearance sub-network (SdeANet) and a prototypical personalization sub-network (ProPNet). The SdeANet aims to explore causalities among facial images, gazes, and personalized information and extract a subject-invariant appearance-aware feature of each image by means of causal intervention. The ProPNet aims to characterize customized personalization-aware features of arbitrary users with the help of a prototype-based subject identification task. Furthermore, our whole PCNet is optimized in a hybrid episodic training paradigm, which further improve its adaptability to new users. Experiments on three challenging datasets over within-domain and cross-domain gaze estimation tasks demonstrate the effectiveness of our method.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"3733-3747\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/11026798/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/11026798/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

注视估计任务的目的是预测给定人脸或眼睛图像的三维注视方向或二维注视点。为了提高注视估计模型对未见新用户的泛化能力，现有方法要么将所有受试者的个性化信息从他们的注视特征中分离出来，要么将未经提炼的个性化信息整合到混合嵌入中。他们的方法并不严谨，其表现仍不尽人意。在本文中，我们提出了“脱离与整合”的综合视角来处理个性化信息，阐述了对于特定用户而言，应丢弃其不相关的个性化信息，而考虑其相关的个性化信息。因此，本文提出了一种用于广义注视估计的个性化因果网络（PCNet）。PCNet采用双分支架构，包括一个主体解构外观子网（SdeANet）和一个原型个性化子网（ProPNet）。SdeANet旨在探索面部图像、凝视和个性化信息之间的因果关系，并通过因果干预的方式提取每个图像的主体不变的外观感知特征。ProPNet旨在通过基于原型的主题识别任务来描述任意用户的定制个性化感知特征。此外，我们的整个PCNet在混合情景训练范式下进行了优化，进一步提高了它对新用户的适应性。在三个具有挑战性的数据集上进行域内和跨域凝视估计任务的实验证明了我们的方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

‘Disengage AND Integrate’: Personalized Causal Network for Gaze Estimation

Gaze estimation task aims to predict a 3D gaze direction or a 2D gaze point given a face or eye image. To improve generalization of gaze estimation models to unseen new users, existing methods either disentangle personalized information of all subjects from their gaze features, or integrate unrefined personalized information into blended embeddings. Their methodologies are not rigorous whose performance is still unsatisfactory. In this paper, we put forward a comprehensive perspective named ‘Disengage AND Integrate’ to deal with personalized information, which elaborates that for specified users, their irrelevant personalized information should be discarded while relevant one should be considered. Accordingly, a novel Personalized Causal Network (PCNet) for generalizable gaze estimation has been proposed. The PCNet adopts a two-branch framework, which consists of a subject-deconfounded appearance sub-network (SdeANet) and a prototypical personalization sub-network (ProPNet). The SdeANet aims to explore causalities among facial images, gazes, and personalized information and extract a subject-invariant appearance-aware feature of each image by means of causal intervention. The ProPNet aims to characterize customized personalization-aware features of arbitrary users with the help of a prototype-based subject identification task. Furthermore, our whole PCNet is optimized in a hybrid episodic training paradigm, which further improve its adaptability to new users. Experiments on three challenging datasets over within-domain and cross-domain gaze estimation tasks demonstrate the effectiveness of our method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量