IEEE Transactions on Image Processing最新文献

筛选
英文 中文
BITS: Bit-Extendable Incremental Hashing in Open Environments BITS:开放环境中的位扩展增量哈希
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-10-01 DOI: 10.1109/tip.2025.3613924
Yongxin Wang, Zhen-Duo Chen, Xin Luo, Xin-Shun Xu
{"title":"BITS: Bit-Extendable Incremental Hashing in Open Environments","authors":"Yongxin Wang, Zhen-Duo Chen, Xin Luo, Xin-Shun Xu","doi":"10.1109/tip.2025.3613924","DOIUrl":"https://doi.org/10.1109/tip.2025.3613924","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"115 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Reversible Watermarking with Invisible Distortion Against VAE Watermark Removal 抗VAE水印去除的不可见失真鲁棒可逆水印
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-10-01 DOI: 10.1109/tip.2025.3613958
Bobiao Guo, Ping Ping, Fan Liu, Feng Xu
{"title":"Robust Reversible Watermarking with Invisible Distortion Against VAE Watermark Removal","authors":"Bobiao Guo, Ping Ping, Fan Liu, Feng Xu","doi":"10.1109/tip.2025.3613958","DOIUrl":"https://doi.org/10.1109/tip.2025.3613958","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"9 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Exploring Vision-Based Active 3D Object Detection by Informativeness Characterization 探索基于视觉的主动3D目标检测的信息特征
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-10-01 DOI: 10.1109/tip.2025.3613927
Ruixiang Li, Yiming Wu, Yehao Lu, Xuewei Li, Xian Wang, Xiubo Liang, Xi Li
{"title":"Exploring Vision-Based Active 3D Object Detection by Informativeness Characterization","authors":"Ruixiang Li, Yiming Wu, Yehao Lu, Xuewei Li, Xian Wang, Xiubo Liang, Xi Li","doi":"10.1109/tip.2025.3613927","DOIUrl":"https://doi.org/10.1109/tip.2025.3613927","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"104 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145203236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mgs-Stereo: Multi-scale Geometric-Structure-Enhanced Stereo Matching for Complex Real-World Scenes. Mgs-Stereo:复杂现实场景的多尺度几何结构增强立体匹配。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-26 DOI: 10.1109/tip.2025.3612754
Zhien Dai,Zhaohui Tang,Hu Zhang,Yongfang Xie
{"title":"Mgs-Stereo: Multi-scale Geometric-Structure-Enhanced Stereo Matching for Complex Real-World Scenes.","authors":"Zhien Dai,Zhaohui Tang,Hu Zhang,Yongfang Xie","doi":"10.1109/tip.2025.3612754","DOIUrl":"https://doi.org/10.1109/tip.2025.3612754","url":null,"abstract":"Complex imaging environments and conditions in real-world scenes pose significant challenges for stereo matching tasks. Models are susceptible to underperformance in non-Lambertian surfaces, weakly textured regions, and occluded regions, due to the difficulty in establishing accurate matching relationships between pixels. To alleviate these problems, we propose a multi-scale geometrically enhanced stereo matching model that exploits the geometric structural relationships of the objects in the scene to mitigate these problems. Firstly, a geometric structure perception module is designed to extract geometric information from the reference view. Secondly, a geometric structure-adaptive embedding module is proposed to integrate geometric information with matching similarity information. This module integrates multi-source features dynamically to predict disparity residuals in different regions. Third, a geometric-based normalized disparity correction module is proposed to improve matching robustness for pathological regions in realistic complex scenes. Extensive evaluations on popular benchmarks demonstrate that our method achieves competitive performance against leading approaches. Notably, our model provides robust and accurate predictions in challenging regions containing edges, occlusions, reflective, and non-Lambertian surfaces. Our source code will be publicly available.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"42 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145153460","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reduced Biquaternion Dual-Branch Deraining U-Network via Multi-Attention Mechanism. 基于多注意机制的简化双四元数双分支训练u -网络。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-26 DOI: 10.1109/tip.2025.3612841
Shan Gai,Yihao Ni
{"title":"Reduced Biquaternion Dual-Branch Deraining U-Network via Multi-Attention Mechanism.","authors":"Shan Gai,Yihao Ni","doi":"10.1109/tip.2025.3612841","DOIUrl":"https://doi.org/10.1109/tip.2025.3612841","url":null,"abstract":"As a prerequisite for many vision-oriented tasks, image deraining is an effective solution to alleviate performance degradation of these tasks on rainy days. In recent years, the introduction of deep learning has obtained the significant developments in deraining techniques. However, due to the inherent constraints of synthetic datasets and the insufficient robustness of network architecture designs, most existing methods are difficult to fit varied rain patterns and adapt to the transition from synthetic rainy images to real ones, ultimately resulting in unsatisfactory restoration outcomes. To address these issues, we propose a reduced biquaternion dual-branch deraining U-Network (RQ-D2UNet) for better deraining performance, which is the first attempt to apply the reduced biquaternion-valued neural network in the deraining task. The algebraic properties of reduced biquaternion (RQ) can facilitate modeling the rainy artifacts more accurately while preserving the underlying spatial structure of the background image. The comprehensive design scheme of U-shaped architecture and dual-branch structure can extract multi-scale contextual information and fully explore the mixed correlation between rain and rain-free features. Moreover, we also extend the self-attention and convolutional attention mechanisms in the RQ domain, which allow the proposed model to balance both global dependency capture and local feature extraction. Extensive experimental results on various rainy datasets (i.e., rain streak/rain-haze/raindrop/real rain), downstream vision applications (i.e., object detection and segmentation), and similar image restoration tasks (i.e., image desnowing and low-light image enhancement) demonstrate the superiority and versatility of our proposed method.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"52 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145153461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hyperbolic Self-Paced Multi-Expert Network for Cross-Domain Few-Shot Facial Expression Recognition. 跨域少镜头面部表情识别的双曲自定步多专家网络。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-25 DOI: 10.1109/tip.2025.3612281
Xueting Chen,Yan Yan,Jing-Hao Xue,Chang Shu,Hanzi Wang
{"title":"Hyperbolic Self-Paced Multi-Expert Network for Cross-Domain Few-Shot Facial Expression Recognition.","authors":"Xueting Chen,Yan Yan,Jing-Hao Xue,Chang Shu,Hanzi Wang","doi":"10.1109/tip.2025.3612281","DOIUrl":"https://doi.org/10.1109/tip.2025.3612281","url":null,"abstract":"Recently, cross-domain few-shot facial expression recognition (CF-FER), which identifies novel compound expressions with a few images in the target domain by using the model trained only on basic expressions in the source domain, has attracted increasing attention. Generally, existing CF-FER methods leverage the multi-dataset to increase the diversity of the source domain and alleviate the discrepancy between the source and target domains. However, these methods learn feature embeddings in the Euclidean space without considering imbalanced expression categories and imbalanced sample difficulty in the multi-dataset. Such a way makes the model difficult to capture hierarchical relationships of facial expressions, resulting in inferior transferable representations. To address these issues, we propose a hyperbolic self-paced multi-expert network (HSM-Net), which contains multiple mixture-of-experts (MoE) layers located in the hyperbolic space, for CF-FER. Specifically, HSM-Net collaboratively trains multiple experts in a self-distillation manner, where each expert focuses on learning a subset of expression categories from the multi-dataset. Based on this, we introduce a hyperbolic self-paced learning (HSL) strategy that exploits sample difficulty to adaptively train the model from easy-to-hard samples, greatly reducing the influence of imbalanced expression categories and imbalanced sample difficulty. Our HSM-Net can effectively model rich hierarchical relationships of facial expressions and obtain a highly transferable feature space. Extensive experiments on both in-the-lab and in-the-wild compound expression datasets demonstrate the superiority of our proposed method over several state-of-the-art methods. Code will be released at https://github.com/cxtjl/HSM-Net.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"42 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145140286","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VisionHub: Learning Task-Plugins for Efficient Universal Vision Model. VisionHub:高效通用视觉模型的学习任务插件。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-25 DOI: 10.1109/tip.2025.3611645
Haolin Wang,Yixuan Zhu,Wenliang Zhao,Jie Zhou,Jiwen Lu
{"title":"VisionHub: Learning Task-Plugins for Efficient Universal Vision Model.","authors":"Haolin Wang,Yixuan Zhu,Wenliang Zhao,Jie Zhou,Jiwen Lu","doi":"10.1109/tip.2025.3611645","DOIUrl":"https://doi.org/10.1109/tip.2025.3611645","url":null,"abstract":"Building on the success of universal language models in natural language processing (NLP), researchers have recently sought to develop methods capable of tackling a broad spectrum of visual tasks within a unified foundation framework. However, existing universal vision models face significant challenges when adapting to the rapidly expanding scope of downstream tasks. These challenges stem not only from the prohibitive computational and storage expenses associated with training such models but also from the complexity of their workflows, which makes efficient adaptations difficult. Moreover, these models often fail to deliver the required performance and versatility for a broad spectrum of applications, largely due to their incomplete visual generation and perception capabilities, limiting their generalizability and effectiveness in diverse settings. In this paper, we present VisionHub, a novel universal vision model designed to concurrently manage multiple visual restoration and perception tasks, while offering streamlined transferability to downstream tasks. Our model leverages the frozen denoising U-Net architecture from Stable Diffusion as the backbone, fully exploiting its inherent potential for both visual restoration and perception. To further enhance the model's flexibility, we propose the incorporation of lightweight task-plugins and the task router, which are seamlessly integrated onto the U-Net backbone. This architecture enables VisionHub to efficiently handle various vision tasks according to user-provided natural language instructions, all while maintaining minimal storage costs and operational overhead. Extensive experiments across 11 different vision tasks showcase both the efficiency and effectiveness of our approach. Remarkably, VisionHub achieves competitive performance across a variety of benchmarks, including 53.3% mIoU on ADE20K semantic segmentation, 0.253 RMSE on NYUv2 depth estimation, and 74.2 AP on MS-COCO pose estimation.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"91 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145140450","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Consistent Assistant Domains Transformer for Source-free Domain Adaptation. 一致的辅助域转换器,用于无源域适应。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-25 DOI: 10.1109/tip.2025.3611799
Renrong Shao,Wei Zhang,Kangyang Luo,Qin Li,Jun Wang
{"title":"Consistent Assistant Domains Transformer for Source-free Domain Adaptation.","authors":"Renrong Shao,Wei Zhang,Kangyang Luo,Qin Li,Jun Wang","doi":"10.1109/tip.2025.3611799","DOIUrl":"https://doi.org/10.1109/tip.2025.3611799","url":null,"abstract":"Source-free domain adaptation (SFDA) aims to address the challenge of adapting to a target domain without accessing the source domain directly. However, due to the inaccessibility of source domain data, deterministic invariable features cannot be obtained. Current mainstream methods primarily focus on evaluating invariant features in the target domain that closely resemble those in the source domain, subsequently aligning the target domain with the source domain. However, these methods are susceptible to hard samples and influenced by domain bias. In this paper, we propose a Consistent Assistant Domains Transformer for SFDA, abbreviated as CADTrans, which solves the issue by constructing invariable feature representations of domain consistency. Concretely, we develop an assistant domain module for CADTrans to obtain diversified representations from the intermediate aggregated global attentions, which addresses the limitation of existing methods in adequately representing diversity. Based on assistant and target domains, invariable feature representations are obtained by multiple consistent strategies, which can be used to distinguish easy and hard samples. Finally, to align the hard samples to the corresponding easy samples, we construct a conditional multi-kernel max mean discrepancy (CMK-MMD) strategy to distinguish between samples of the same category and those of different categories. Extensive experiments are conducted on various benchmarks such as Office-31, Office-Home, VISDA-C, and DomainNet-126, proving the significant performance improvements achieved by our proposed approaches. Code is available at https://github.com/RoryShao/CADTrans.git.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"93 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145140452","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Solution 从模态有效性的角度重新审视rbt跟踪基准:一个新的基准、问题与解决方案
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-25 DOI: 10.1109/tip.2025.3611687
Zhangyong Tang, Tianyang Xu, Xiao-Jun Wu, Xuefeng Zhu, Chunyang Cheng, Zhenhua Feng, Josef Kittler
{"title":"Revisiting RGBT Tracking Benchmarks from the Perspective of Modality Validity: A New Benchmark, Problem, and Solution","authors":"Zhangyong Tang, Tianyang Xu, Xiao-Jun Wu, Xuefeng Zhu, Chunyang Cheng, Zhenhua Feng, Josef Kittler","doi":"10.1109/tip.2025.3611687","DOIUrl":"https://doi.org/10.1109/tip.2025.3611687","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"31 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145141196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep Sparse-to-Dense Inbetweening for Multi-View Light Fields. 多视点光场的深度稀疏到密集间。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-25 DOI: 10.1109/tip.2025.3612257
Yifan Mao,Zeyu Xiao,Ping An,Deyang Liu,Caifeng Shan
{"title":"Deep Sparse-to-Dense Inbetweening for Multi-View Light Fields.","authors":"Yifan Mao,Zeyu Xiao,Ping An,Deyang Liu,Caifeng Shan","doi":"10.1109/tip.2025.3612257","DOIUrl":"https://doi.org/10.1109/tip.2025.3612257","url":null,"abstract":"Light field (LF) imaging, which captures both intensity and directional information of light rays, extends the capabilities of traditional imaging techniques. In this paper, we introduce a task in the field of LF imaging, sparse-to-dense inbetweening, which focuses on generating dense novel views from sparse multi-view LFs. By synthesizing intermediate views from sparse inputs, this task enhances LF view synthesis through filling in interperspective gaps within an expanded field of view and increasing data robustness by leveraging complementary information between light rays from different perspectives, which are limited by non-robust single-view synthesis and the inability to handle sparse inputs effectively. To address these challenges, we construct a high-quality multi-view LF dataset, consisting of 60 indoor scenes and 59 outdoor scenes. Building upon this dataset, we propose a baseline method. Specifically, we introduce an adaptive alignment module to dynamically align information by capturing relative displacements. Next, we explore angular consistency and hierarchical information using a multi-level feature decoupling module. Finally, a multi-level feature refinement module is applied to enhance features and facilitate reconstruction. Additionally, we introduce a universally applicable artifact-aware loss function to effectively suppress visual artifacts. Experimental results demonstrate that our method outperforms existing approaches, establishing a benchmark for sparse-to-dense inbetweening. The code is available at https://github.com/Starmao1/MutiLF.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"41 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145140288","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信