切换特征的无监督凝视表征学习。

IF 18.6

IEEE transactions on pattern analysis and machine intelligence Pub Date : 2025-08-19 DOI:10.1109/TPAMI.2025.3600680

Yunjia Sun, Jiabei Zeng, Shiguang Shan, Xilin Chen

{"title":"切换特征的无监督凝视表征学习。","authors":"Yunjia Sun, Jiabei Zeng, Shiguang Shan, Xilin Chen","doi":"10.1109/TPAMI.2025.3600680","DOIUrl":null,"url":null,"abstract":"It is prevalent to leverage unlabeled data to train deep learning models when it is difficult to collect large-scale annotated datasets. However, for 3D gaze estimation, most existing unsupervised learning methods face challenges in distinguishing subtle gaze-relevant information from dominant gaze-irrelevant information. To address this issue, we propose an unsupervised learning framework to disentangle the gaze-relevant and the gaze-irrelevant information, by seeking the shared information of a pair of input images with the same gaze and with the same eye respectively. Specifically, given two images, the framework finds their shared information by first encoding the images into two latent features via two encoders and then switching part of the features before feeding them to the decoders for image reconstruction. We theoretically prove that the proposed framework is able to encode different information into different parts of the latent feature if we properly select the training image pairs and their shared information. Based on the framework, we derive Cross-Encoder and Cross-Encoder++ to learn gaze representation from the eye images and face images, respectively. Experiments on pubic gaze datasets demonstrate that the Cross-Encoder and Cross-Encoder++ outperform the competitive methods. The ablation study quantitatively and qualitatively shows that the gaze feature is successfully extracted.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"PP ","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unsupervised Gaze Representation Learning by Switching Features.\",\"authors\":\"Yunjia Sun, Jiabei Zeng, Shiguang Shan, Xilin Chen\",\"doi\":\"10.1109/TPAMI.2025.3600680\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"It is prevalent to leverage unlabeled data to train deep learning models when it is difficult to collect large-scale annotated datasets. However, for 3D gaze estimation, most existing unsupervised learning methods face challenges in distinguishing subtle gaze-relevant information from dominant gaze-irrelevant information. To address this issue, we propose an unsupervised learning framework to disentangle the gaze-relevant and the gaze-irrelevant information, by seeking the shared information of a pair of input images with the same gaze and with the same eye respectively. Specifically, given two images, the framework finds their shared information by first encoding the images into two latent features via two encoders and then switching part of the features before feeding them to the decoders for image reconstruction. We theoretically prove that the proposed framework is able to encode different information into different parts of the latent feature if we properly select the training image pairs and their shared information. Based on the framework, we derive Cross-Encoder and Cross-Encoder++ to learn gaze representation from the eye images and face images, respectively. Experiments on pubic gaze datasets demonstrate that the Cross-Encoder and Cross-Encoder++ outperform the competitive methods. The ablation study quantitatively and qualitatively shows that the gaze feature is successfully extracted.\",\"PeriodicalId\":94034,\"journal\":{\"name\":\"IEEE transactions on pattern analysis and machine intelligence\",\"volume\":\"PP \",\"pages\":\"\"},\"PeriodicalIF\":18.6000,\"publicationDate\":\"2025-08-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on pattern analysis and machine intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TPAMI.2025.3600680\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TPAMI.2025.3600680","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

当难以收集大规模带注释的数据集时，利用未标记的数据来训练深度学习模型是很普遍的。然而，对于三维凝视估计，大多数现有的无监督学习方法在区分微妙的凝视相关信息和主要的凝视无关信息方面面临挑战。为了解决这一问题，我们提出了一个无监督学习框架，通过分别寻找具有相同凝视和同一只眼睛的一对输入图像的共享信息，来解开凝视相关和凝视无关的信息。具体来说，给定两幅图像，该框架首先通过两个编码器将图像编码为两个潜在特征，然后将部分特征交换给解码器进行图像重建，从而找到它们的共享信息。从理论上证明，只要选择合适的训练图像对及其共享信息，所提出的框架能够将不同的信息编码到潜在特征的不同部分。基于该框架，我们分别导出了Cross-Encoder和Cross-Encoder++，分别从眼睛图像和面部图像中学习凝视表征。在公共凝视数据集上的实验表明，Cross-Encoder和Cross-Encoder++优于竞争对手的方法。定量和定性的消融研究表明，成功地提取了凝视特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsupervised Gaze Representation Learning by Switching Features.

It is prevalent to leverage unlabeled data to train deep learning models when it is difficult to collect large-scale annotated datasets. However, for 3D gaze estimation, most existing unsupervised learning methods face challenges in distinguishing subtle gaze-relevant information from dominant gaze-irrelevant information. To address this issue, we propose an unsupervised learning framework to disentangle the gaze-relevant and the gaze-irrelevant information, by seeking the shared information of a pair of input images with the same gaze and with the same eye respectively. Specifically, given two images, the framework finds their shared information by first encoding the images into two latent features via two encoders and then switching part of the features before feeding them to the decoders for image reconstruction. We theoretically prove that the proposed framework is able to encode different information into different parts of the latent feature if we properly select the training image pairs and their shared information. Based on the framework, we derive Cross-Encoder and Cross-Encoder++ to learn gaze representation from the eye images and face images, respectively. Experiments on pubic gaze datasets demonstrate that the Cross-Encoder and Cross-Encoder++ outperform the competitive methods. The ablation study quantitatively and qualitatively shows that the gaze feature is successfully extracted.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on pattern analysis and machine intelligence

自引率

0.00%

发文量