Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni
{"title":"基于可变形近似大核追踪的大型生成模型脉冲轻量注视估计","authors":"Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni","doi":"10.1109/TIP.2025.3529379","DOIUrl":null,"url":null,"abstract":"Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1149-1162"},"PeriodicalIF":13.7000,"publicationDate":"2025-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Large Generative Model Impulsed Lightweight Gaze Estimator via Deformable Approximate Large Kernel Pursuit\",\"authors\":\"Xuanhong Chen;Muchun Chen;Yugang Chen;Yinxin Lin;Bilian Ke;Bingbing Ni\",\"doi\":\"10.1109/TIP.2025.3529379\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"1149-1162\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2025-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10847727/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10847727/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large Generative Model Impulsed Lightweight Gaze Estimator via Deformable Approximate Large Kernel Pursuit
Efficient and highly accurate lightweight gaze estimation method has been receiving increasing research attention due to the emergence of mobile interactive platforms such as mobile device and AR/VR. State-of-the-art deep learning based gaze estimation models suffer from either heavy computational architecture which is infeasible for mobile deployment or limited generalization capability which cannot deal with large diversity in eye texture or distinguish subtle/frequent pupil movement. To mitigate the above challenges, we propose a novel lightweight network structure featuring a deformable approximate large kernel which can effectively extend the receptive field to handle complicated eye movement and highly varying eye/gaze region appearance with very tight computational budget. In the meantime, we embed the training of the gaze estimator into a control information extraction module, which serves as a gaze-parameter input that modularizes a large generative model (Stable Diffusion V1.5) to output gaze-specific eye images. In this way, the great generalization capability of large generative model could be implicitly distilled/pursued into our lightweight gaze model. Extensive comparisons with various state-of-the-art gaze estimation methods demonstrate the superiority of our proposed model and training scheme in terms of both accuracy and model complexity.