IEEE Transactions on Image Processing最新文献

筛选
英文 中文
HOVER: Hyperbolic Video-Text Retrieval HOVER:双曲视频文本检索
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-23 DOI: 10.1109/tip.2025.3611174
Jun Wen, Yufeng Chen, Ruiqi Shi, Wei Ji, Menglin Yang, Difei Gao, Junsong Yuan, Roger Zimmermann
{"title":"HOVER: Hyperbolic Video-Text Retrieval","authors":"Jun Wen, Yufeng Chen, Ruiqi Shi, Wei Ji, Menglin Yang, Difei Gao, Junsong Yuan, Roger Zimmermann","doi":"10.1109/tip.2025.3611174","DOIUrl":"https://doi.org/10.1109/tip.2025.3611174","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"40 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145127830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
No-Reference Image Quality Assessment Leveraging GenAI Images 利用GenAI图像的无参考图像质量评估
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-22 DOI: 10.1109/tip.2025.3610238
Qingbing Sang, Qian Li, Lixiong Liu, Zhaohong Deng, Xiaojun Wu, Alan C. Bovik
{"title":"No-Reference Image Quality Assessment Leveraging GenAI Images","authors":"Qingbing Sang, Qian Li, Lixiong Liu, Zhaohong Deng, Xiaojun Wu, Alan C. Bovik","doi":"10.1109/tip.2025.3610238","DOIUrl":"https://doi.org/10.1109/tip.2025.3610238","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"37 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Privacy-Preserving Visual Localization with Event Cameras 事件相机的隐私保护视觉定位
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-22 DOI: 10.1109/tip.2025.3607640
Junho Kim, Young Min Kim, Ramzi Zahreddine, Weston A. Welge, Gurunandan Krishnan, Sizhuo Ma, Jian Wang
{"title":"Privacy-Preserving Visual Localization with Event Cameras","authors":"Junho Kim, Young Min Kim, Ramzi Zahreddine, Weston A. Welge, Gurunandan Krishnan, Sizhuo Ma, Jian Wang","doi":"10.1109/tip.2025.3607640","DOIUrl":"https://doi.org/10.1109/tip.2025.3607640","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"39 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145116183","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SRS: Siamese Reconstruction-Segmentation Network based on Dynamic-Parameter Convolution SRS:基于动态参数卷积的Siamese重建分割网络
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-19 DOI: 10.1109/tip.2025.3607624
Bingkun Nian, Fenghe Tang, Jianrui Ding, Jie Yang, Zhonglong Zheng, Shaohua Kevin Zhou, Wei Liu
{"title":"SRS: Siamese Reconstruction-Segmentation Network based on Dynamic-Parameter Convolution","authors":"Bingkun Nian, Fenghe Tang, Jianrui Ding, Jie Yang, Zhonglong Zheng, Shaohua Kevin Zhou, Wei Liu","doi":"10.1109/tip.2025.3607624","DOIUrl":"https://doi.org/10.1109/tip.2025.3607624","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"38 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145089107","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient and Structure Consistency in Multimodal Emotion Recognition. 多模态情感识别中的梯度和结构一致性。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-18 DOI: 10.1109/tip.2025.3608664
QingHongYa Shi,Mang Ye,Wenke Huang,Bo Du,Xiaofen Zong
{"title":"Gradient and Structure Consistency in Multimodal Emotion Recognition.","authors":"QingHongYa Shi,Mang Ye,Wenke Huang,Bo Du,Xiaofen Zong","doi":"10.1109/tip.2025.3608664","DOIUrl":"https://doi.org/10.1109/tip.2025.3608664","url":null,"abstract":"Multimodal emotion recognition is a task that integrates text, visual, and audio data to holistically infer an individual's emotional state. Existing research predominantly focuses on exploiting modality-specific cues for joint learning, often ignoring the differences between multiple modalities under common goal learning. Due to multimodal heterogeneity, common goal learning inadvertently introduces optimization biases and interaction noise. To address above challenges, we propose a novel approach named Gradient and Structure Consistency (GSCon). Our strategy operates at both overall and individual levels to consider balance optimization and effective interaction respectively. At the overall level, to avoid the optimization suppression of a modality on other modalities, we construct a balanced gradient direction that aligns each modality's optimization direction, ensuring unbiased convergence. Simultaneously, at the individual level, to avoid the interaction noise caused by multimodal alignment, we align the spatial structure of samples in different modalities. The spatial structure of the samples will not differ due to modal heterogeneity, achieving effective inter-modal interaction. Extensive experiments on multimodal emotion recognition and multimodal intention understanding datasets demonstrate the effectiveness of the proposed method. Code is available at https://github.com/ShiQingHongYa/GSCon.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"6 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083514","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semantic-Driven Global-Local Fusion Transformer for Image Super-Resolution. 面向图像超分辨率的语义驱动全局-局部融合变压器。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-18 DOI: 10.1109/tip.2025.3609106
Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao
{"title":"Semantic-Driven Global-Local Fusion Transformer for Image Super-Resolution.","authors":"Kaibing Zhang,Zhouwei Cheng,Xin He,Jie Li,Xinbo Gao","doi":"10.1109/tip.2025.3609106","DOIUrl":"https://doi.org/10.1109/tip.2025.3609106","url":null,"abstract":"Image Super-Resolution (SR) has seen remarkable progress with the emergence of transformer-based architectures. However, due to the high computational cost, many existing transformer-based SR methods limit their attention to local windows, which hinders their ability to model long-range dependencies and global structures. To address these challenges, we propose a novel SR framework named Semantic-Driven Global-Local Fusion Transformer (SGLFT). The proposed model enhances the receptive field by combining a Hybrid Window Transformer (HWT) and a Scalable Transformer Module (STM) to jointly capture local textures and global context. To further strengthen the semantic consistency of reconstruction, we introduce a Semantic Extraction Module (SEM) that distills high-level semantic priors from the input. These semantic cues are adaptively integrated with visual features through an Adaptive Feature Fusion Semantic Integration Module (AFFSIM). Extensive experiments on standard benchmarks demonstrate the effectiveness of SGLFT in producing visually faithful and structurally consistent SR results. The code will be available at https://github.com/kbzhang0505/SGLFT.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"22 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145083520","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation NanoHTNet:用于高效三维人体姿态估计的纳米人体拓扑网络
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-17 DOI: 10.1109/tip.2025.3608662
Jialun Cai, Mengyuan Liu, Hong Liu, Shuheng Zhou, Wenhao Li
{"title":"NanoHTNet: Nano Human Topology Network for Efficient 3D Human Pose Estimation","authors":"Jialun Cai, Mengyuan Liu, Hong Liu, Shuheng Zhou, Wenhao Li","doi":"10.1109/tip.2025.3608662","DOIUrl":"https://doi.org/10.1109/tip.2025.3608662","url":null,"abstract":"","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"1 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145077461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
URFusion: Unsupervised Unified Degradation-Robust Image Fusion Network. URFusion:无监督统一退化-鲁棒图像融合网络。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607628
Han Xu,Xunpeng Yi,Chen Lu,Guangcan Liu,Jiayi Ma
{"title":"URFusion: Unsupervised Unified Degradation-Robust Image Fusion Network.","authors":"Han Xu,Xunpeng Yi,Chen Lu,Guangcan Liu,Jiayi Ma","doi":"10.1109/tip.2025.3607628","DOIUrl":"https://doi.org/10.1109/tip.2025.3607628","url":null,"abstract":"When dealing with low-quality source images, existing image fusion methods either fail to handle degradations or are restricted to specific degradations. This study proposes an unsupervised unified degradation-robust image fusion network, termed as URFusion, in which various types of degradations can be uniformly eliminated during the fusion process, leading to high-quality fused images. URFusion is composed of three core modules: intrinsic content extraction, intrinsic content fusion, and appearance representation learning and assignment. It first extracts degradation-free intrinsic content features from images affected by various degradations. These content features then provide feature-level rather than image-level fusion constraints for optimizing the fusion network, effectively eliminating degradation residues and reliance on ground truth. Finally, URFusion learns the appearance representation of images and assign the statistical appearance representation of high-quality images to the content-fused result, producing the final high-quality fused image. Extensive experiments on multi-exposure image fusion and multi-modal image fusion tasks demonstrate the advantages of URFusion in fusion performance and suppression of multiple types of degradations. The code is available at https://github.com/hanna-xu/URFusion.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"17 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Harmonized Domain Enabled Alternate Search for Infrared and Visible Image Alignment. 协调域启用替代搜索红外和可见光图像对齐。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607585
Zhiying Jiang,Zengxi Zhang,Jinyuan Liu
{"title":"Harmonized Domain Enabled Alternate Search for Infrared and Visible Image Alignment.","authors":"Zhiying Jiang,Zengxi Zhang,Jinyuan Liu","doi":"10.1109/tip.2025.3607585","DOIUrl":"https://doi.org/10.1109/tip.2025.3607585","url":null,"abstract":"Infrared and visible image alignment is essential and critical to the fusion and multi-modal perception applications. It addresses discrepancies in position and scale caused by spectral properties and environmental variations, ensuring precise pixel correspondence and spatial consistency. Existing manual calibration requires regular maintenance and exhibits poor portability, challenging the adaptability of multi-modal application in dynamic environments. In this paper, we propose a harmonized representation based infrared and visible image alignment, achieving both high accuracy and scene adaptability. Specifically, with regard to the disparity between multi-modal images, we develop an invertible translation process to establish a harmonized representation domain that effectively encapsulates the feature intensity and distribution of both infrared and visible modalities. Building on this, we design a hierarchical framework to correct deformations inferred from the harmonized domain in a coarse-to-fine manner. Our framework leverages advanced perception capabilities alongside residual estimation to enable accurate regression of sparse offsets, while an alternate correlation search mechanism ensures precise correspondence matching. Furthermore, we propose the first ground truth available misaligned infrared and visible image benchmark for evaluation. Extensive experiments validate the effectiveness of the proposed method against the state-of-the-arts, advancing the subsequent applications further.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"50 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Source-Free Object Detection with Detection Transformer. 无源对象检测与检测变压器。
IF 10.6 1区 计算机科学
IEEE Transactions on Image Processing Pub Date : 2025-09-16 DOI: 10.1109/tip.2025.3607621
Huizai Yao,Sicheng Zhao,Shuo Lu,Hui Chen,Yangyang Li,Guoping Liu,Tengfei Xing,Chenggang Yan,Jianhua Tao,Guiguang Ding
{"title":"Source-Free Object Detection with Detection Transformer.","authors":"Huizai Yao,Sicheng Zhao,Shuo Lu,Hui Chen,Yangyang Li,Guoping Liu,Tengfei Xing,Chenggang Yan,Jianhua Tao,Guiguang Ding","doi":"10.1109/tip.2025.3607621","DOIUrl":"https://doi.org/10.1109/tip.2025.3607621","url":null,"abstract":"Source-Free Object Detection (SFOD) enables knowledge transfer from a source domain to an unsupervised target domain for object detection without access to source data. Most existing SFOD approaches are either confined to conventional object detection (OD) models like Faster R-CNN or designed as general solutions without tailored adaptations for novel OD architectures, especially Detection Transformer (DETR). In this paper, we introduce Feature Reweighting ANd Contrastive Learning NetworK (FRANCK), a novel SFOD framework specifically designed to perform query-centric feature enhancement for DETRs. FRANCK comprises four key components: (1) an Objectness Score-based Sample Reweighting (OSSR) module that computes attention-based objectness scores on multi-scale encoder feature maps, reweighting the detection loss to emphasize less-recognized regions; (2) a Contrastive Learning with Matching-based Memory Bank (CMMB) module that integrates multi-level features into memory banks, enhancing class-wise contrastive learning; (3) an Uncertainty-weighted Query-fused Feature Distillation (UQFD) module that improves feature distillation through prediction quality reweighting and query feature fusion; and (4) an improved self-training pipeline with a Dynamic Teacher Updating Interval (DTUI) that optimizes pseudo-label quality. By leveraging these components, FRANCK effectively adapts a source-pretrained DETR model to a target domain with enhanced robustness and generalization. Extensive experiments on several widely used benchmarks demonstrate that our method achieves state-of-the-art performance, highlighting its effectiveness and compatibility with DETR-based SFOD models.","PeriodicalId":13217,"journal":{"name":"IEEE Transactions on Image Processing","volume":"30 1","pages":""},"PeriodicalIF":10.6,"publicationDate":"2025-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145071832","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信