IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

筛选
英文 中文
Task-to-Instance Prompt Learning for Vision-Language Models at Test Time
Zhihe Lu;Jiawang Bai;Xin Li;Zeyu Xiao;Xinchao Wang
{"title":"Task-to-Instance Prompt Learning for Vision-Language Models at Test Time","authors":"Zhihe Lu;Jiawang Bai;Xin Li;Zeyu Xiao;Xinchao Wang","doi":"10.1109/TIP.2025.3546840","DOIUrl":"10.1109/TIP.2025.3546840","url":null,"abstract":"Prompt learning has been recently introduced into the adaption of pre-trained vision-language models (VLMs) by tuning a set of trainable tokens to replace hand-crafted text templates. Despite the encouraging results achieved, existing methods largely rely on extra annotated data for training. In this paper, we investigate a more realistic scenario, where only the unlabeled test data is available. Existing test-time prompt learning methods often separately learn a prompt for each test sample. However, relying solely on a single sample heavily limits the performance of the learned prompts, as it neglects the task-level knowledge that can be gained from multiple samples. To that end, we propose a novel test-time prompt learning method of VLMs, called Task-to-Instance PromPt LEarning (TIPPLE), which adopts a two-stage training strategy to leverage both task- and instance-level knowledge. Specifically, we reformulate the effective online pseudo-labeling paradigm along with two tailored components: an auxiliary text classification task and a diversity regularization term, to serve the task-oriented prompt learning. After that, the learned task-level prompt is further combined with a tunable residual for each test sample to integrate with instance-level knowledge. We demonstrate the superior performance of TIPPLE on 15 downstream datasets, e.g., the average improvement of 1.87% over the state-of-the-art method, using ViT-B/16 visual backbone. Our code is open-sourced at <uri>https://github.com/zhiheLu/TIPPLE</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1908-1920"},"PeriodicalIF":0.0,"publicationDate":"2025-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143631135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Stage Statistical Texture-Guided GAN for Tilted Face Frontalization
Kangli Zeng;Zhongyuan Wang;Tao Lu;Jianyu Chen;Chao Liang;Zhen Han
{"title":"Multi-Stage Statistical Texture-Guided GAN for Tilted Face Frontalization","authors":"Kangli Zeng;Zhongyuan Wang;Tao Lu;Jianyu Chen;Chao Liang;Zhen Han","doi":"10.1109/TIP.2025.3548896","DOIUrl":"10.1109/TIP.2025.3548896","url":null,"abstract":"Existing pose-invariant face recognition mainly focuses on frontal or profile, whereas high-pitch angle face recognition, prevalent under surveillance videos, has yet to be investigated. More importantly, tilted faces significantly differ from frontal or profile faces in the potential feature space due to self-occlusion, thus seriously affecting key feature extraction for face recognition. In this paper, we asymptotically reshape challenging high-pitch angle faces into a series of small-angle approximate frontal faces and exploit a statistical approach to learn texture features to ensure accurate facial component generation. In particular, we design a statistical texture-guided GAN for tilted face frontalization (STG-GAN) consisting of three main components. First, the face encoder extracts shallow features, followed by the face statistical texture modeling module that learns multi-scale face texture features based on the statistical distributions of the shallow features. Then, the face decoder performs feature deformation guided by the face statistical texture features while highlighting the pose-invariant face discriminative information. With the addition of multi-scale content loss, identity loss and adversarial loss, we further develop a pose contrastive loss of potential spatial features to constrain pose consistency and make its face frontalization process more reliable. On this basis, we propose a divide-and-conquer strategy, using STG-GAN to progressively synthesize faces with small pitch angles in multiple stages to achieve frontalization gradually. A unified end-to-end training across multiple stages facilitates the generation of numerous intermediate results to achieve a reasonable approximation of the ground truth. Extensive qualitative and quantitative experiments on multiple-face datasets demonstrate the superiority of our approach.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1726-1736"},"PeriodicalIF":0.0,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143618164","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Domain Diffusion With Progressive Alignment for Efficient Adaptive Retrieval
Junyu Luo;Yusheng Zhao;Xiao Luo;Zhiping Xiao;Wei Ju;Li Shen;Dacheng Tao;Ming Zhang
{"title":"Cross-Domain Diffusion With Progressive Alignment for Efficient Adaptive Retrieval","authors":"Junyu Luo;Yusheng Zhao;Xiao Luo;Zhiping Xiao;Wei Ju;Li Shen;Dacheng Tao;Ming Zhang","doi":"10.1109/TIP.2025.3547678","DOIUrl":"10.1109/TIP.2025.3547678","url":null,"abstract":"Unsupervised efficient domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, while maintaining low storage cost and high retrieval efficiency. However, existing methods typically fail to address potential noise in the target domain, and directly align high-level features across domains, thus resulting in suboptimal retrieval performance. To address these challenges, we propose a novel Cross-Domain Diffusion with Progressive Alignment method (COUPLE). This approach revisits unsupervised efficient domain adaptive retrieval from a graph diffusion perspective, simulating cross-domain adaptation dynamics to achieve a stable target domain adaptation process. First, we construct a cross-domain relationship graph and leverage noise-robust graph flow diffusion to simulate the transfer dynamics from the source domain to the target domain, identifying lower noise clusters. We then leverage the graph diffusion results for discriminative hash code learning, effectively learning from the target domain while reducing the negative impact of noise. Furthermore, we employ a hierarchical Mixup operation for progressive domain alignment, which is performed along the cross-domain random walk paths. Utilizing target domain discriminative hash learning and progressive domain alignment, COUPLE enables effective domain adaptive hash learning. Extensive experiments demonstrate COUPLE’s effectiveness on competitive benchmarks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1820-1834"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599233","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
LangLoc: Language-Driven Localization via Formatted Spatial Description Generation
Weimin Shi;Changhao Chen;Kaige Li;Yuan Xiong;Xiaochun Cao;Zhong Zhou
{"title":"LangLoc: Language-Driven Localization via Formatted Spatial Description Generation","authors":"Weimin Shi;Changhao Chen;Kaige Li;Yuan Xiong;Xiaochun Cao;Zhong Zhou","doi":"10.1109/TIP.2025.3546853","DOIUrl":"10.1109/TIP.2025.3546853","url":null,"abstract":"Existing localization methods commonly employ vision to perceive scene and achieve localization in GNSS-denied areas, yet they often struggle in environments with complex lighting conditions, dynamic objects or privacy-preserving areas. Humans possess the ability to describe various scenes using natural language, effectively inferring their location by leveraging the rich semantic information in these descriptions. Harnessing language presents a potential solution for robust localization. Thus, this study introduces a new task, Language-driven Localization, and proposes a novel localization framework, LangLoc, which determines the user’s position and orientation through textual descriptions. Given the diversity of natural language descriptions, we first design a Spatial Description Generator (SDG), foundational to LangLoc, which extracts and combines the position and attribute information of objects within a scene to generate uniformly formatted textual descriptions. SDG eliminates the ambiguity of language, detailing the spatial layout and object relations of the scene, providing a reliable basis for localization. With generated descriptions, LangLoc effortlessly achieves language-only localization using text encoder and pose regressor. Furthermore, LangLoc can add one image to text input, achieving mutual optimization and feature adaptive fusion across modalities through two modality-specific encoders, cross-modal fusion, and multimodal joint learning strategies. This enhances the framework’s capability to handle complex scenes, achieving more accurate localization. Extensive experiments on the Oxford RobotCar, 4-Seasons, and Virtual Gallery datasets demonstrate LangLoc’s effectiveness in both language-only and visual-language localization across various outdoor and indoor scenarios. Notably, LangLoc achieves noticeable performance gains when using both text and image inputs in challenging conditions such as overexposure, low lighting, and occlusions, showcasing its superior robustness.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1737-1752"},"PeriodicalIF":0.0,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599232","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Axis Feature Diversity Enhancement for Remote Sensing Video Super-Resolution
Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Shiqi Wang;Chia-Wen Lin
{"title":"Multi-Axis Feature Diversity Enhancement for Remote Sensing Video Super-Resolution","authors":"Yi Xiao;Qiangqiang Yuan;Kui Jiang;Yuzeng Chen;Shiqi Wang;Chia-Wen Lin","doi":"10.1109/TIP.2025.3547298","DOIUrl":"10.1109/TIP.2025.3547298","url":null,"abstract":"How to aggregate spatial-temporal information plays an essential role in video super-resolution (VSR) tasks. Despite the remarkable success, existing methods adopt static convolution to encode spatial-temporal information, which lacks flexibility in aggregating information in large-scale remote sensing scenes, as they often contain heterogeneous features (e.g., diverse textures). In this paper, we propose a spatial feature diversity enhancement module (SDE) and channel diversity enhancement module (CDE), which explore the diverse representation of different local patterns while aggregating the global response with compactly channel-wise embedding representation. Specifically, SDE introduces multiple learnable filters to extract representative spatial variants and encodes them to generate a dynamic kernel for enriched spatial representation. To explore the diversity in the channel dimension, CDE exploits the discrete cosine transform to transform the feature into the frequency domain. This enriches the channel representation while mitigating massive frequency loss caused by pooling operation. Based on SDE and CDE, we further devise a multi-axis feature diversity enhancement (MADE) module to harmonize the spatial, channel, and pixel-wise features for diverse feature fusion. These elaborate strategies form a novel network for satellite VSR, termed MADNet, which achieves favorable performance against state-of-the-art method BasicVSR++ in terms of average PSNR by 0.14 dB on various video satellites, including JiLin-1, Carbonite-2, SkySat-1, and UrtheCast. Code will be available at <uri>https://github.com/XY-boy/MADNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1766-1778"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575370","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bayesian Nonnegative Tensor Completion With Automatic Rank Determination
Zecan Yang;Laurence T. Yang;Huaimin Wang;Honglu Zhao;Debin Liu
{"title":"Bayesian Nonnegative Tensor Completion With Automatic Rank Determination","authors":"Zecan Yang;Laurence T. Yang;Huaimin Wang;Honglu Zhao;Debin Liu","doi":"10.1109/TIP.2024.3459647","DOIUrl":"10.1109/TIP.2024.3459647","url":null,"abstract":"Nonnegative CANDECOMP/PARAFAC (CP) factorization of incomplete tensors is a powerful technique for finding meaningful and physically interpretable latent factor matrices to achieve nonnegative tensor completion. However, most existing nonnegative CP models rely on manually predefined tensor ranks, which introduces uncertainty and leads the models to overfit or underfit. Although the presence of CP models within the probabilistic framework can estimate rank better, they lack the ability to learn nonnegative factors from incomplete data. In addition, existing approaches tend to focus on point estimation and ignore estimating uncertainty. To address these issues within a unified framework, we propose a fully Bayesian treatment of nonnegative tensor completion with automatic rank determination. Benefitting from the Bayesian framework and the hierarchical sparsity-inducing priors, the model can provide uncertainty estimates of nonnegative latent factors and effectively obtain low-rank structures from incomplete tensors. Additionally, the proposed model can mitigate problems of parameter selection and overfitting. For model learning, we develop two fully Bayesian inference methods for posterior estimation and propose a hybrid computing strategy that reduces the time overhead for large-scale data significantly. Extensive simulations on synthetic data demonstrate that our model can recover missing data with high precision and automatically estimate CP rank from incomplete tensors. Moreover, results from real-world applications demonstrate that our model is superior to state-of-the-art methods in image and video inpainting. The code is available at <uri>https://github.com/zecanyang/BNTC</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2036-2051"},"PeriodicalIF":0.0,"publicationDate":"2025-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143575377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iris Geometric Transformation Guided Deep Appearance-Based Gaze Estimation
Wei Nie;Zhiyong Wang;Weihong Ren;Hanlin Zhang;Honghai Liu
{"title":"Iris Geometric Transformation Guided Deep Appearance-Based Gaze Estimation","authors":"Wei Nie;Zhiyong Wang;Weihong Ren;Hanlin Zhang;Honghai Liu","doi":"10.1109/TIP.2025.3546465","DOIUrl":"10.1109/TIP.2025.3546465","url":null,"abstract":"The geometric alterations in the iris’s appearance are intricately linked to the gaze direction. However, current deep appearance-based gaze estimation methods mainly rely on latent feature sharing to leverage iris features for improving deep representation learning, often neglecting the explicit modeling of their geometric relationships. To address this issue, this paper revisits the physiological structure of the eyeball and introduces a set of geometric assumptions, such as “the normal vector of the iris center approximates the gaze direction”. Building on these assumptions, we propose an Iris Geometric Transformation Guided Gaze estimation (IGTG-Gaze) module, which establishes an explicit geometric parameter sharing mechanism to link gaze direction and sparse iris landmark coordinates directly. Extensive experimental results demonstrate that IGTG-Gaze seamlessly integrates into various deep neural networks, flexibly extends from sparse iris landmarks to dense eye mesh, and consistently achieves leading performance in both within- and cross-dataset evaluations, all while maintaining end-to-end optimization. These advantages highlight IGTG-Gaze as a practical and effective approach for enhancing deep gaze representation from appearance.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1616-1631"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143570485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Advancing Real-World Stereoscopic Image Super-Resolution via Vision-Language Model
Zhe Zhang;Jianjun Lei;Bo Peng;Jie Zhu;Liying Xu;Qingming Huang
{"title":"Advancing Real-World Stereoscopic Image Super-Resolution via Vision-Language Model","authors":"Zhe Zhang;Jianjun Lei;Bo Peng;Jie Zhu;Liying Xu;Qingming Huang","doi":"10.1109/TIP.2025.3546470","DOIUrl":"10.1109/TIP.2025.3546470","url":null,"abstract":"Recent years have witnessed the remarkable success of the vision-language model in various computer vision tasks. However, how to exploit the semantic language knowledge of the vision-language model to advance real-world stereoscopic image super-resolution remains a challenging problem. This paper proposes a vision-language model-based stereoscopic image super-resolution (VLM-SSR) method, in which the semantic language knowledge in CLIP is exploited to facilitate stereoscopic image SR in a training-free manner. Specifically, by designing visual prompts for CLIP to infer the region similarity, a prompt-guided information aggregation mechanism is presented to capture inter-view information among relevant regions between the left and right views. Besides, driven by the prior knowledge of CLIP, a cognition prior-driven iterative enhancing mechanism is presented to optimize fuzzy regions adaptively. Experimental results on four datasets verify the effectiveness of the proposed method.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2187-2197"},"PeriodicalIF":0.0,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569813","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Global Cross-Entropy Loss for Deep Face Recognition
Weisong Zhao;Xiangyu Zhu;Haichao Shi;Xiao-Yu Zhang;Guoying Zhao;Zhen Lei
{"title":"Global Cross-Entropy Loss for Deep Face Recognition","authors":"Weisong Zhao;Xiangyu Zhu;Haichao Shi;Xiao-Yu Zhang;Guoying Zhao;Zhen Lei","doi":"10.1109/TIP.2025.3546481","DOIUrl":"10.1109/TIP.2025.3546481","url":null,"abstract":"Contemporary deep face recognition techniques predominantly utilize the Softmax loss function, designed based on the similarities between sample features and class prototypes. These similarities can be categorized into four types: in-sample target similarity, in-sample non-target similarity, out-sample target similarity, and out-sample non-target similarity. When a sample feature from a specific class is designated as the anchor, the similarity between this sample and any class prototype is referred to as in-sample similarity. In contrast, the similarity between samples from other classes and any class prototype is known as out-sample similarity. The terms target and non-target indicate whether the sample and the class prototype used for similarity calculation belong to the same identity or not. The conventional Softmax loss function promotes higher in-sample target similarity than in-sample non-target similarity. However, it overlooks the relation between in-sample and out-sample similarity. In this paper, we propose Global Cross-Entropy loss (GCE), which promotes 1) greater in-sample target similarity over both the in-sample and out-sample non-target similarity, and 2) smaller in-sample non-target similarity to both in-sample and out-sample target similarity. In addition, we propose to establish a bilateral margin penalty for both in-sample target and non-target similarity, so that the discrimination and generalization of the deep face model are improved. To bridge the gap between training and testing of face recognition, we adapt the GCE loss into a pairwise framework by randomly replacing some class prototypes with sample features. We designate the model trained with the proposed Global Cross-Entropy loss as GFace. Extensive experiments on several public face benchmarks, including LFW, CALFW, CPLFW, CFP-FP, AgeDB, IJB-C, IJB-B, MFR-Ongoing, and MegaFace, demonstrate the superiority of GFace over other methods. Additionally, GFace exhibits robust performance in general visual recognition task.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"1672-1685"},"PeriodicalIF":0.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Decoupled Doubly Contrastive Learning for Cross-Domain Facial Action Unit Detection
Yong Li;Menglin Liu;Zhen Cui;Yi Ding;Yuan Zong;Wenming Zheng;Shiguang Shan;Cuntai Guan
{"title":"Decoupled Doubly Contrastive Learning for Cross-Domain Facial Action Unit Detection","authors":"Yong Li;Menglin Liu;Zhen Cui;Yi Ding;Yuan Zong;Wenming Zheng;Shiguang Shan;Cuntai Guan","doi":"10.1109/TIP.2025.3546479","DOIUrl":"10.1109/TIP.2025.3546479","url":null,"abstract":"Despite the impressive performance of current vision-based facial action unit (AU) detection approaches, they are heavily susceptible to the variations across different domains and the cross-domain AU detection methods are under-explored. In response to this challenge, we propose a decoupled doubly contrastive adaptation (D2CA) approach to learn a purified AU representation that is semantically aligned for the source and target domains. Specifically, we decompose latent representations into AU-relevant and AU-irrelevant components, with the objective of exclusively facilitating adaptation within the AU-relevant subspace. To achieve the feature decoupling, D2CA is trained to disentangle AU and domain factors by assessing the quality of synthesized faces in cross-domain scenarios when either AU or domain attributes are modified. To further strengthen feature decoupling, particularly in scenarios with limited AU data diversity, D2CA employs a doubly contrastive learning mechanism comprising image and feature-level contrastive learning to ensure the quality of synthesized faces and mitigate feature ambiguities. This new framework leads to an automatically learned, dedicated separation of AU-relevant and domain-relevant factors, and it enables intuitive, scale-specific control of the cross-domain facial image synthesis. Extensive experiments demonstrate the efficacy of D2CA in successfully decoupling AU and domain factors, yielding visually pleasing cross-domain synthesized facial images. Meanwhile, D2CA consistently outperforms state-of-the-art cross-domain AU detection approaches, achieving an average F1 score improvement of 6%-14% across various cross-domain scenarios.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2067-2080"},"PeriodicalIF":0.0,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143569141","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信