基于可变形近似大核追踪的大型生成模型脉冲轻量注视估计

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-01-20 DOI:10.1109/TIP.2025.3529379

Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni

{"title":"基于可变形近似大核追踪的大型生成模型脉冲轻量注视估计","authors":"Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni","doi":"10.1109/TIP.2025.3529379","DOIUrl":null,"url":null,"abstract":"Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1149-1162"},"PeriodicalIF":13.7000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Generative Model Impulsed Lightweight Gaze Estimator via Deformable Approximate Large Kernel Pursuit\",\"authors\":\"Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni\",\"doi\":\"10.1109/TIP.2025.3529379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"1149-1162\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10847727/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10847727/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着移动设备、AR/VR等移动交互平台的出现，高效、高精度的轻量化凝视估计方法受到越来越多的研究关注。目前基于深度学习的注视估计模型要么存在计算架构过于庞大、不适合移动部署的问题，要么泛化能力有限，无法处理眼睛纹理的巨大多样性，也无法区分细微/频繁的瞳孔运动。为了缓解上述挑战，我们提出了一种新颖的轻量级网络结构，该结构具有可变形的近似大核，可以在非常紧张的计算预算下有效地扩展感受野以处理复杂的眼动和高度变化的眼/凝视区域外观。同时，我们将注视估计器的训练嵌入到控制信息提取模块中，该模块作为注视参数输入，模块化了一个大型生成模型（Stable Diffusion V1.5）来输出特定于注视的眼睛图像。通过这种方式，可以将大型生成模型的强大泛化能力隐式地提取/追求到我们的轻量级凝视模型中。与各种最先进的凝视估计方法进行了广泛的比较，证明了我们提出的模型和训练方案在准确性和模型复杂性方面的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Large Generative Model Impulsed Lightweight Gaze Estimator via Deformable Approximate Large Kernel Pursuit

Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量