{"title":"从凝视抖动到领域适应:通过操纵高频成分实现凝视估计的通用化","authors":"Ruicong Liu, Haofei Wang, Feng Lu","doi":"10.1007/s11263-024-02233-1","DOIUrl":null,"url":null,"abstract":"<p>Gaze, as a pivotal indicator of human emotion, plays a crucial role in various computer vision tasks. However, the accuracy of gaze estimation often significantly deteriorates when applied to unseen environments, thereby limiting its practical value. Therefore, enhancing the generalizability of gaze estimators to new domains emerges as a critical challenge. A common limitation in existing domain adaptation research is the inability to identify and leverage truly influential factors during the adaptation process. This shortcoming often results in issues such as limited accuracy and unstable adaptation. To address this issue, this article discovers a truly influential factor in the cross-domain problem, <i>i.e.</i>, high-frequency components (HFC). This discovery stems from an analysis of gaze jitter-a frequently overlooked but impactful issue where predictions can deviate drastically even for visually similar input images. Inspired by this discovery, we propose an “embed-then-suppress\" HFC manipulation strategy to adapt gaze estimation to new domains. Our method first embeds additive HFC to the input images, then performs domain adaptation by suppressing the impact of HFC. Specifically, the suppression is carried out in a contrasive manner. Each original image is paired with its HFC-embedded version, thereby enabling our method to suppress the HFC impact through contrasting the representations within the pairs. The proposed method is evaluated across four cross-domain gaze estimation tasks. The experimental results show that it not only enhances gaze estimation accuracy but also significantly reduces gaze jitter in the target domain. Compared with previous studies, our method offers higher accuracy, reduced gaze jitter, and improved adaptation stability, marking the potential for practical deployment.</p>","PeriodicalId":13752,"journal":{"name":"International Journal of Computer Vision","volume":"1 1","pages":""},"PeriodicalIF":11.6000,"publicationDate":"2024-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components\",\"authors\":\"Ruicong Liu, Haofei Wang, Feng Lu\",\"doi\":\"10.1007/s11263-024-02233-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Gaze, as a pivotal indicator of human emotion, plays a crucial role in various computer vision tasks. However, the accuracy of gaze estimation often significantly deteriorates when applied to unseen environments, thereby limiting its practical value. Therefore, enhancing the generalizability of gaze estimators to new domains emerges as a critical challenge. A common limitation in existing domain adaptation research is the inability to identify and leverage truly influential factors during the adaptation process. This shortcoming often results in issues such as limited accuracy and unstable adaptation. To address this issue, this article discovers a truly influential factor in the cross-domain problem, <i>i.e.</i>, high-frequency components (HFC). This discovery stems from an analysis of gaze jitter-a frequently overlooked but impactful issue where predictions can deviate drastically even for visually similar input images. Inspired by this discovery, we propose an “embed-then-suppress\\\" HFC manipulation strategy to adapt gaze estimation to new domains. Our method first embeds additive HFC to the input images, then performs domain adaptation by suppressing the impact of HFC. Specifically, the suppression is carried out in a contrasive manner. Each original image is paired with its HFC-embedded version, thereby enabling our method to suppress the HFC impact through contrasting the representations within the pairs. The proposed method is evaluated across four cross-domain gaze estimation tasks. The experimental results show that it not only enhances gaze estimation accuracy but also significantly reduces gaze jitter in the target domain. Compared with previous studies, our method offers higher accuracy, reduced gaze jitter, and improved adaptation stability, marking the potential for practical deployment.</p>\",\"PeriodicalId\":13752,\"journal\":{\"name\":\"International Journal of Computer Vision\",\"volume\":\"1 1\",\"pages\":\"\"},\"PeriodicalIF\":11.6000,\"publicationDate\":\"2024-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Computer Vision\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11263-024-02233-1\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Computer Vision","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11263-024-02233-1","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
From Gaze Jitter to Domain Adaptation: Generalizing Gaze Estimation by Manipulating High-Frequency Components
Gaze, as a pivotal indicator of human emotion, plays a crucial role in various computer vision tasks. However, the accuracy of gaze estimation often significantly deteriorates when applied to unseen environments, thereby limiting its practical value. Therefore, enhancing the generalizability of gaze estimators to new domains emerges as a critical challenge. A common limitation in existing domain adaptation research is the inability to identify and leverage truly influential factors during the adaptation process. This shortcoming often results in issues such as limited accuracy and unstable adaptation. To address this issue, this article discovers a truly influential factor in the cross-domain problem, i.e., high-frequency components (HFC). This discovery stems from an analysis of gaze jitter-a frequently overlooked but impactful issue where predictions can deviate drastically even for visually similar input images. Inspired by this discovery, we propose an “embed-then-suppress" HFC manipulation strategy to adapt gaze estimation to new domains. Our method first embeds additive HFC to the input images, then performs domain adaptation by suppressing the impact of HFC. Specifically, the suppression is carried out in a contrasive manner. Each original image is paired with its HFC-embedded version, thereby enabling our method to suppress the HFC impact through contrasting the representations within the pairs. The proposed method is evaluated across four cross-domain gaze estimation tasks. The experimental results show that it not only enhances gaze estimation accuracy but also significantly reduces gaze jitter in the target domain. Compared with previous studies, our method offers higher accuracy, reduced gaze jitter, and improved adaptation stability, marking the potential for practical deployment.
期刊介绍:
The International Journal of Computer Vision (IJCV) serves as a platform for sharing new research findings in the rapidly growing field of computer vision. It publishes 12 issues annually and presents high-quality, original contributions to the science and engineering of computer vision. The journal encompasses various types of articles to cater to different research outputs.
Regular articles, which span up to 25 journal pages, focus on significant technical advancements that are of broad interest to the field. These articles showcase substantial progress in computer vision.
Short articles, limited to 10 pages, offer a swift publication path for novel research outcomes. They provide a quicker means for sharing new findings with the computer vision community.
Survey articles, comprising up to 30 pages, offer critical evaluations of the current state of the art in computer vision or offer tutorial presentations of relevant topics. These articles provide comprehensive and insightful overviews of specific subject areas.
In addition to technical articles, the journal also includes book reviews, position papers, and editorials by prominent scientific figures. These contributions serve to complement the technical content and provide valuable perspectives.
The journal encourages authors to include supplementary material online, such as images, video sequences, data sets, and software. This additional material enhances the understanding and reproducibility of the published research.
Overall, the International Journal of Computer Vision is a comprehensive publication that caters to researchers in this rapidly growing field. It covers a range of article types, offers additional online resources, and facilitates the dissemination of impactful research.