IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献

筛选
英文 中文
Hyperspectral Information Extraction With Full Resolution From Arbitrary Photographs 高光谱信息提取与全分辨率从任意照片。
IF 13.7
Semin Kwon;Sang Mok Park;Yuhyun Ji;Haripriya Sakthivel;Jung Woo Leem;Young L. Kim
{"title":"Hyperspectral Information Extraction With Full Resolution From Arbitrary Photographs","authors":"Semin Kwon;Sang Mok Park;Yuhyun Ji;Haripriya Sakthivel;Jung Woo Leem;Young L. Kim","doi":"10.1109/TIP.2025.3597038","DOIUrl":"10.1109/TIP.2025.3597038","url":null,"abstract":"Because optical spectrometers capture abundant molecular, biological, and physical information beyond images, ongoing efforts focus on both algorithmic and hardware approaches to obtain detailed spectral information. Spectral reconstruction from red-green-blue (RGB) values acquired by conventional trichromatic cameras has been an active area of study. However, the resultant spectral profile is often affected not only by the unknown spectral properties of the sample itself, but also by light conditions, device characteristics, and image file formats. Existing machine learning models for spectral reconstruction are further limited in generalizability due to their reliance on task-specific training data or fixed models. Advanced spectrometer hardware employing sophisticated nanofabricated components also constrains scalability and affordability. Here we introduce a general computational framework, co-designed with spectrally incoherent color reference charts, to recover the spectral information of an arbitrary sample from a single-shot photo in the visible range. The mutual optimization of reference color selection and the computational algorithm eliminates the need for training data or pretrained models. In transmission mode, altered RGB values of reference colors are used to recover the spectral intensity of the sample, achieving spectral resolution comparable to that of scientific spectrometers. In reflection mode, a spectral hypercube of the sample can be constructed from a single-shot photo, analogous to hyperspectral imaging. The reported computational photography spectrometry has the potential to make optical spectroscopy and hyperspectral imaging accessible using off-the-shelf smartphones.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5429-5441"},"PeriodicalIF":13.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11125864","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144884588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Semi-Supervised Medical Hyperspectral Image Segmentation Using Adversarial Consistency Constraint Learning and Cross Indication Network 基于对抗性一致性约束学习和交叉指征网络的半监督医学高光谱图像分割。
IF 13.7
Geng Qin;Huan Liu;Xueyu Zhang;Wei Li;Yuxing Guo;Chuanbin Guo
{"title":"Semi-Supervised Medical Hyperspectral Image Segmentation Using Adversarial Consistency Constraint Learning and Cross Indication Network","authors":"Geng Qin;Huan Liu;Xueyu Zhang;Wei Li;Yuxing Guo;Chuanbin Guo","doi":"10.1109/TIP.2025.3598499","DOIUrl":"10.1109/TIP.2025.3598499","url":null,"abstract":"Hyperspectral imaging technology is considered a new paradigm for high-precision pathological image segmentation due to its ability to obtain spatial and spectral information of the detected object simultaneously. However, due to the time-consuming and laborious manual annotation, precise annotation of medical hyperspectral images is difficult to obtain. Therefore, there is an urgent need for a semi-supervised learning framework that can fully utilize unlabeled data for medical hyperspectral image segmentation. In this work, we propose an adversarial consistency constraint learning cross indication network (ACCL-CINet), which achieves accurate pathological image segmentation through adversarial consistency constraint learning training strategies. The ACCL-CINet comprises a contextual and structural encoder to form the spatial-spectral feature encoding part. The contextual and structural indications are aggregated into features through a cross indication attention module and finally decoded by a pixel decoder to generate prediction results. For the semi-supervised training strategy, a pixel perceptual consistency module encourages the two models to generate consistent and low-entropy predictions. Secondly, a pixel maximum neighborhood probability adversarial constraint strategy is designed, which produces high-quality pseudo labels for cross supervision training. The proposed ACCL-CINet has been rigorously evaluated on both public and private datasets, with experimental results demonstrating that it outperforms state-of-the-art semi-supervised methods. The code is available at: <uri>https://github.com/Qugeryolo/ACCL-CINet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5414-5428"},"PeriodicalIF":13.7,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144884637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Alternating Direction Unfolding With a Cross Spectral Attention Prior for Dual-Camera Compressive Hyperspectral Imaging 基于交叉光谱注意先验的双相机压缩高光谱成像交替方向展开。
IF 13.7
Yubo Dong;Dahua Gao;Danhua Liu;Yanli Liu;Guangming Shi
{"title":"Alternating Direction Unfolding With a Cross Spectral Attention Prior for Dual-Camera Compressive Hyperspectral Imaging","authors":"Yubo Dong;Dahua Gao;Danhua Liu;Yanli Liu;Guangming Shi","doi":"10.1109/TIP.2025.3597775","DOIUrl":"10.1109/TIP.2025.3597775","url":null,"abstract":"Coded Aperture Snapshot Spectral Imaging (CASSI) multiplexes 3D Hyperspectral Images (HSIs) into a 2D sensor to capture dynamic spectral scenes, which, however, sacrifices the spatial information. Dual-Camera Compressive Hyperspectral Imaging (DCCHI) enhances CASSI by incorporating a Panchromatic (PAN) camera to compensate for the loss of spatial information in CASSI. However, the dual-camera structure of DCCHI disrupts the diagonal property of the product of the sensing matrix and its transpose, making it difficult to efficiently and accurately solve the data subproblem in closed-form and thereby hindering the application of model-based methods and Deep Unfolding Networks (DUNs) that rely on such a closed-form solution. To address this issue, we propose an Alternating Direction DUN, named ADRNN, which decouples the imaging model of DCCHI into a CASSI subproblem and a PAN subproblem. The ADRNN alternately solves data terms analytically and a joint prior term in these subproblems. Additionally, we propose a Cross Spectral Transformer (XST) to exploit the joint prior. The XST utilizes cross spectral attention to exploit the correlation between the compressed HSI and the PAN image, and incorporates Grouped-Query Attention (GQA) to alleviate the burden of parameters and computational cost brought by impartially treating the compressed HSI and the PAN image. Furthermore, we built a real DCCHI system and captured large-scale indoor and outdoor scenes for future academic research. Extensive experiments on both simulation and real datasets demonstrate that the proposed method achieves state-of-the-art (SOTA) performance. The code and datasets have been open-sourced at: <uri>https://github.com/ShawnDong98/ADRNN-XST</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5325-5340"},"PeriodicalIF":13.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Color Spike Camera Reconstruction via Long Short-Term Temporal Aggregation of Spike Signals 基于脉冲信号长短期时间聚合的彩色脉冲相机重建。
IF 13.7
Yanchen Dong;Ruiqin Xiong;Jing Zhao;Xiaopeng Fan;Xinfeng Zhang;Tiejun Huang
{"title":"Color Spike Camera Reconstruction via Long Short-Term Temporal Aggregation of Spike Signals","authors":"Yanchen Dong;Ruiqin Xiong;Jing Zhao;Xiaopeng Fan;Xinfeng Zhang;Tiejun Huang","doi":"10.1109/TIP.2025.3595368","DOIUrl":"10.1109/TIP.2025.3595368","url":null,"abstract":"With the prevalence of emerging computer vision applications, the demand for capturing dynamic scenes with high-speed motion has increased. A kind of neuromorphic sensor called spike camera shows great potential in this aspect since it generates a stream of binary spikes to describe the dynamic light intensity with a very high temporal resolution. Color spike camera (CSC) was recently invented to capture the color information of dynamic scenes via a color filter array (CFA) on the sensor. This paper proposes a long short-term temporal aggregation strategy of spike signals. First, we utilize short-term temporal correlation to adaptively extract temporal features of each time point. Then we align the features and aggregate them to exploit long-term temporal correlation, suppressing undesired motion blur. To implement the strategy, we design a CSC reconstruction network. Based on adaptive short-term temporal aggregation, we propose a spike representation module to extract temporal features of each color channel, leveraging multiple temporal scales. Considering the long-term temporal correlation, we develop an alignment module to align the temporal features. In particular, we perform motion alignment of red and blue channels with the guidance of the higher-sampling-rate green channel, leveraging motion consistency among color channels. Besides, we propose a module to aggregate the aligned temporal features for the restored color image, which exploits color channel correlation. We have also developed a CSC simulator for data generation. Experimental results demonstrate that our method can restore color images with fine texture details, achieving state-of-the-art CSC reconstruction performance.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5312-5324"},"PeriodicalIF":13.7,"publicationDate":"2025-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144877628","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Geometric-Aware Low-Light Image and Video Enhancement via Depth Guidance 基于深度引导的几何感知低光图像和视频增强。
IF 13.7
Yingqi Lin;Xiaogang Xu;Jiafei Wu;Yan Han;Zhe Liu
{"title":"Geometric-Aware Low-Light Image and Video Enhancement via Depth Guidance","authors":"Yingqi Lin;Xiaogang Xu;Jiafei Wu;Yan Han;Zhe Liu","doi":"10.1109/TIP.2025.3597046","DOIUrl":"10.1109/TIP.2025.3597046","url":null,"abstract":"Low-Light Enhancement (LLE) is aimed at improving the quality of photos/videos captured under low-light conditions. It is worth noting that most existing LLE methods do not take advantage of geometric modeling. We believe that incorporating geometric information can enhance LLE performance, as it provides insights into the physical structure of the scene that influences illumination conditions. To address this, we propose a Geometry-Guided Low-Light Enhancement Refine Framework (GG-LLERF) designed to assist low-light enhancement models in learning improved features by integrating geometric priors into the feature representation space. In this paper, we employ depth priors as the geometric representation. Our approach focuses on the integration of depth priors into various LLE frameworks using a unified methodology. This methodology comprises two key novel modules. First, a depth-aware feature extraction module is designed to inject depth priors into the image representation. Then, the Hierarchical Depth-Guided Feature Fusion Module (HDGFFM) is formulated with a cross-domain attention mechanism, which combines depth-aware features with the original image features within LLE models. We conducted extensive experiments on public low-light image and video enhancement benchmarks. The results illustrate that our framework significantly enhances existing LLE methods. The source code and pre-trained models are available at <uri>https://github.com/Estheryingqi/GG-LLERF</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5442-5457"},"PeriodicalIF":13.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Confound Controlled Multimodal Neuroimaging Data Fusion and Its Application to Developmental Disorders 混杂控制多模态神经影像数据融合及其在发育障碍中的应用。
IF 13.7
Chuang Liang;Rogers F. Silva;Tülay Adali;Rongtao Jiang;Daoqiang Zhang;Shile Qi;Vince D. Calhoun
{"title":"Confound Controlled Multimodal Neuroimaging Data Fusion and Its Application to Developmental Disorders","authors":"Chuang Liang;Rogers F. Silva;Tülay Adali;Rongtao Jiang;Daoqiang Zhang;Shile Qi;Vince D. Calhoun","doi":"10.1109/TIP.2025.3597045","DOIUrl":"10.1109/TIP.2025.3597045","url":null,"abstract":"Multimodal fusion provides multiple benefits over single modality analysis by leveraging both shared and complementary information from different modalities. Notably, supervised fusion enjoys extensive interest for capturing multimodal co-varying patterns associated with clinical measures. A key challenge of brain data analysis is how to handle confounds, which, if unaddressed, can lead to an unrealistic description of the relationship between the brain and clinical measures. Current approaches often rely on linear regression to remove covariate effects prior to fusion, which may lead to information loss, rather than pursue the more global strategy of optimizing both fusion and covariates removal simultaneously. Thus, we propose “CR-mCCAR” to jointly optimize for confounds within a guided fusion model, capturing co-varying multimodal patterns associated with a specific clinical domain while also discounting covariate effects. Simulations show that CR-mCCAR separate the reference and covariate factors accurately. Functional and structural neuroimaging data fusion reveals co-varying patterns in attention deficit/hyperactivity disorder (ADHD, striato-thalamo-cortical and salience areas) and in autism spectrum disorder (ASD, salience and fronto-temporal areas) that link with core symptoms but uncorrelate with age and motion. These results replicate in an independent cohort. Downstream classification accuracy between ADHD/ASD and controls is markedly higher for CR-mCCAR compared to fusion and regression separately. CR-mCCAR can be extended to include multiple targets and multiple covariates. Overall, results demonstrate CR-mCCAR can jointly optimize for target components that correlate with the reference(s) while removing nuisance covariates. This approach can improve the meaningful detection of reliable phenotype-linked multimodal biomarkers for brain disorders.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5271-5284"},"PeriodicalIF":13.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144851240","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parts2Whole: Generalizable Multi-Part Portrait Customization parts2整体:可通用的多部分肖像定制。
IF 13.7
Hongxing Fan;Zehuan Huang;Lipeng Wang;Haohua Chen;Li Yin;Lu Sheng
{"title":"Parts2Whole: Generalizable Multi-Part Portrait Customization","authors":"Hongxing Fan;Zehuan Huang;Lipeng Wang;Haohua Chen;Li Yin;Lu Sheng","doi":"10.1109/TIP.2025.3597037","DOIUrl":"10.1109/TIP.2025.3597037","url":null,"abstract":"Multi-part portrait customization aims to generate realistic human images by assembling specified body parts from multiple reference images, with significant applications in digital human creation. Existing customization methods typically follow two approaches: 1) test-time fine-tuning, which learn concepts effectively but is time-consuming and struggles with multi-part composition; 2) generalizable feed-forward methods, which offer efficiency but lack fine control over appearance specifics. To address these limitations, we present Parts2Whole, a diffusion-based generalizable portrait generator that harmoniously integrates multiple reference parts into high-fidelity human images by our proposed multi-reference mechanism. To adequately characterize each part, we propose a detail-aware appearance encoder, which is initialized and inherits powerful image priors from the pre-trained denoising U-Net, enabling the encoding of detailed information from reference images. The extracted features are incorporated into the denoising U-Net by a shared self-attention mechanism, enhanced by mask information for precise part selection. Additionally, we integrate pose map conditioning to control the target posture of generated portraits, facilitating more flexible customization. Extensive experiments demonstrate the superiority of our approach over existing methods and applicability to related tasks like pose transfer and pose-guided human image generation, showcasing its versatile conditioning. Our project is available at <uri>https://huanngzh.github.io/Parts2Whole/</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5241-5256"},"PeriodicalIF":13.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion 利用深度基础模型传播稀疏深度,实现非分布深度补全。
IF 13.7
Shenglun Chen;Xinzhu Ma;Hong Zhang;Haojie Li;Zhihui Wang
{"title":"Propagating Sparse Depth via Depth Foundation Model for Out-of-Distribution Depth Completion","authors":"Shenglun Chen;Xinzhu Ma;Hong Zhang;Haojie Li;Zhihui Wang","doi":"10.1109/TIP.2025.3597047","DOIUrl":"10.1109/TIP.2025.3597047","url":null,"abstract":"Depth completion is a pivotal challenge in computer vision, aiming at reconstructing the dense depth map from a sparse one, typically with a paired RGB image. Existing learning-based models rely on carefully prepared but limited data, leading to significant performance degradation in out-of-distribution (OOD) scenarios. Recent foundation models have demonstrated exceptional robustness in monocular depth estimation through large-scale training, and using such models to enhance the robustness of depth completion models is a promising solution. In this work, we propose a novel depth completion framework that leverages depth foundation models to attain remarkable robustness without large-scale training. Specifically, we leverage a depth foundation model to extract environmental cues, including structural and semantic context, from RGB images to guide the propagation of sparse depth information into missing regions. We further design a dual-space propagation approach, without any learnable parameters, to effectively propagate sparse depth in both 3D and 2D spaces to maintain geometric structure and local consistency. To refine the intricate structure, we introduce a learnable correction module to progressively adjust the depth prediction towards the real depth. We train our model on the NYUv2 and KITTI datasets as in-distribution datasets and extensively evaluate the framework on 16 other datasets. Our framework performs remarkably well in the OOD scenarios and outperforms existing state-of-the-art depth completion methods. Our models are released in <uri>https://github.com/shenglunch/PSD</uri>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5285-5299"},"PeriodicalIF":13.7,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144857250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DA3Attacker: A Diffusion-Based Attacker Against Aesthetics-Oriented Black-Box Models da3攻击者:针对面向美学的黑盒模型的基于扩散的攻击者。
IF 13.7
Shuai He;Shuntian Zheng;Anlong Ming;Yanni Wang;Huadong Ma
{"title":"DA3Attacker: A Diffusion-Based Attacker Against Aesthetics-Oriented Black-Box Models","authors":"Shuai He;Shuntian Zheng;Anlong Ming;Yanni Wang;Huadong Ma","doi":"10.1109/TIP.2025.3594068","DOIUrl":"10.1109/TIP.2025.3594068","url":null,"abstract":"The adage “Beautiful Outside But Ugly Inside” resonates with the security and explainability challenges encountered in image aesthetics assessment (IAA). Although deep neural networks (DNNs) have demonstrated remarkable performance in various IAA tasks, how to probe, explain, and enhance aesthetics-oriented “black-box” models has not yet been investigated to our knowledge. This lack of investigation has significantly impeded the commercial application of IAA. In this paper, we investigate the susceptibility of current IAA models to adversarial attacks and aim to elucidate the underlying mechanisms that contribute to their vulnerabilities. To address this, we propose a novel diffusion-based framework as an attacker (DA3Attacker), capable of generating adversarial examples (AEs) to deceive diverse black-box IAA models. DA3Attacker employs a dedicated Attack Diffusion Transformer, equipped with modular aesthetics-oriented filters. By undergoing two unsupervised training stages, it constructs a latent space to generate AEs and facilitates two distinct yet controllable attack modes: restricted and unrestricted. Extensive experiments on 26 baseline models demonstrate that our method effectively explores the vulnerabilities of these IAA models, while also providing multi-attribute explanations for their feature dependencies. To facilitate further research, we contribute the evaluation tools and four metrics for measuring adversarial robustness, as well as a dataset of 60,000 re-labeled AEs for fine-tuning IAA models. The resources are available here.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5300-5311"},"PeriodicalIF":13.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802670","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing Text-Based Person Retrieval by Combining Fused Representation and Reciprocal Learning With Adaptive Loss Refinement 融合表示、互反学习与自适应损失细化相结合增强基于文本的人物检索。
IF 13.7
Anh D. Nguyen;Hoa N. Nguyen
{"title":"Enhancing Text-Based Person Retrieval by Combining Fused Representation and Reciprocal Learning With Adaptive Loss Refinement","authors":"Anh D. Nguyen;Hoa N. Nguyen","doi":"10.1109/TIP.2025.3594880","DOIUrl":"10.1109/TIP.2025.3594880","url":null,"abstract":"Text-based person retrieval is defined as the challenging task of searching for people’s images based on given textual queries in natural language. Conventional methods primarily use deep neural networks to understand the relationship between visual and textual data, creating a shared feature space for cross-modal matching. The absence of awareness regarding variations in feature granularity between the two modalities, coupled with the diverse poses and viewing angles of images corresponding to the same individual, may lead to overlooking significant differences within each modality and across modalities, despite notable enhancements. Furthermore, the inconsistency in caption queries in large public datasets presents an additional obstacle to cross-modality mapping learning. Therefore, we introduce 3RTPR, a novel text-based person retrieval method that integrates a representation fusing mechanism and an adaptive loss refinement algorithm into a dual-encoder branch architecture. Moreover, we propose training two independent models simultaneously, which reciprocally support each other to enhance learning effectiveness. Consequently, our approach encompasses three significant contributions: (i) proposing a fused representation method to generate more discriminative representations for images and captions; (ii) introducing a novel algorithm to adjust loss and prioritize samples that contain valuable information; and (iii) proposing reciprocal learning involving a pair of independent models, which allows us to enhance general retrieval performance. In order to validate our method’s effectiveness, we also demonstrate superior performance over state-of-the-art methods by performing rigorous experiments on three well-known benchmarks: CUHK-PEDES, ICFG-PEDES, and RSTPReid.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5147-5157"},"PeriodicalIF":13.7,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144796955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信