IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献_第7页

Exploring the Coordination of Frequency and Attention in Masked Image Modeling 掩模图像建模中频率与注意力的协调探讨。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-08 DOI: 10.1109/TIP.2025.3592555

Jie Gui;Tuo Chen;Minjing Dong;Zhengqi Liu;Hao Luo;James Tin-Yau Kwok;Yuan Yan Tang

{"title":"Exploring the Coordination of Frequency and Attention in Masked Image Modeling","authors":"Jie Gui;Tuo Chen;Minjing Dong;Zhengqi Liu;Hao Luo;James Tin-Yau Kwok;Yuan Yan Tang","doi":"10.1109/TIP.2025.3592555","DOIUrl":"10.1109/TIP.2025.3592555","url":null,"abstract":"Recently, masked image modeling (MIM), which learns visual representations by reconstructing the masked patches of an image, has become a popular self-supervised paradigm. However, the pre-training of MIM always takes massive time due to the large-scale data and large-size backbones. We mainly attribute it to the random patch masking in previous MIM works, which fails to leverage the crucial semantic information for effective visual representation learning. To tackle this issue, we propose the Frequency & Attention-driven Masking and Throwing Strategy (FAMT), which can detect semantic patches and reduce the number of training patches to boost model performance and training efficiency simultaneously. Specifically, FAMT utilizes the self-attention mechanism to extract semantic information from the image for masking during training in an unsupervised manner. However, attention alone could sometimes focus on inappropriate areas regarding the semantic information. Thus, we are motivated to incorporate the information from the frequency domain into the self-attention mechanism to derive the sampling weights for masking, which captures semantic patches for visual representation learning. Furthermore, we introduce a patch throwing strategy based on the derived sampling weights to reduce the training cost. FAMT can be seamlessly integrated as a plug-and-play module and surpasses previous works, <italic>e.g.</i> reducing the training phase time by nearly 50% and improving the linear probing accuracy of MAE by <inline-formula> <tex-math>$1.8$ </tex-math></inline-formula>% ~ <inline-formula> <tex-math>$ 6.3$ </tex-math></inline-formula>% across various datasets, including CIFAR-10/100, Tiny ImageNet, and ImageNet-1K. FAMT also demonstrates superior performance in downstream detection and segmentation tasks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"6564-6576"},"PeriodicalIF":13.7,"publicationDate":"2025-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144802609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Enhancing Text-Based Person Retrieval by Combining Fused Representation and Reciprocal Learning With Adaptive Loss Refinement 融合表示、互反学习与自适应损失细化相结合增强基于文本的人物检索。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-07 DOI: 10.1109/TIP.2025.3594880

Anh D. Nguyen;Hoa N. Nguyen

{"title":"Enhancing Text-Based Person Retrieval by Combining Fused Representation and Reciprocal Learning With Adaptive Loss Refinement","authors":"Anh D. Nguyen;Hoa N. Nguyen","doi":"10.1109/TIP.2025.3594880","DOIUrl":"10.1109/TIP.2025.3594880","url":null,"abstract":"Text-based person retrieval is defined as the challenging task of searching for people’s images based on given textual queries in natural language. Conventional methods primarily use deep neural networks to understand the relationship between visual and textual data, creating a shared feature space for cross-modal matching. The absence of awareness regarding variations in feature granularity between the two modalities, coupled with the diverse poses and viewing angles of images corresponding to the same individual, may lead to overlooking significant differences within each modality and across modalities, despite notable enhancements. Furthermore, the inconsistency in caption queries in large public datasets presents an additional obstacle to cross-modality mapping learning. Therefore, we introduce 3RTPR, a novel text-based person retrieval method that integrates a representation fusing mechanism and an adaptive loss refinement algorithm into a dual-encoder branch architecture. Moreover, we propose training two independent models simultaneously, which reciprocally support each other to enhance learning effectiveness. Consequently, our approach encompasses three significant contributions: (i) proposing a fused representation method to generate more discriminative representations for images and captions; (ii) introducing a novel algorithm to adjust loss and prioritize samples that contain valuable information; and (iii) proposing reciprocal learning involving a pair of independent models, which allows us to enhance general retrieval performance. In order to validate our method’s effectiveness, we also demonstrate superior performance over state-of-the-art methods by performing rigorous experiments on three well-known benchmarks: CUHK-PEDES, ICFG-PEDES, and RSTPReid.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5147-5157"},"PeriodicalIF":13.7,"publicationDate":"2025-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144796955","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PalmDiff: When Palmprint Generation Meets Controllable Diffusion Model 掌纹生成满足可控扩散模型的PalmDiff。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-06 DOI: 10.1109/TIP.2025.3593974

Long Tang;Tingting Chai;Zheng Zhang;Miao Zhang;Xiangqian Wu

{"title":"PalmDiff: When Palmprint Generation Meets Controllable Diffusion Model","authors":"Long Tang;Tingting Chai;Zheng Zhang;Miao Zhang;Xiangqian Wu","doi":"10.1109/TIP.2025.3593974","DOIUrl":"10.1109/TIP.2025.3593974","url":null,"abstract":"Due to its distinctive texture and intricate details, palmprint has emerged as a critical modality in biometric identity recognition. The absence of large-scale public palmprint datasets has substantially impeded the advancement of palmprint research, resulting in inadequate accuracy in commercial palmprint recognition systems. However, existing generative methods exhibit insufficient generalization, as the images they generate differ in specific ways from the conditional images. This paper proposes a method for generating palmprint images using a controllable diffusion model (PalmDiff), which addresses the issue of insufficient datasets by generating palmprint data, improving the accuracy of palmprint recognition. We introduce a diffusion process that effectively tackles the problems of excessive noise and loss of texture details commonly encountered in diffusion models. A linear attention mechanism is employed to enhance the backbone’s expressive capacity and reduce the computational complexity. To this end, we proposed an ID loss function to enable the diffusion model to generate palmprint images under the same identical space consistently. PalmDiff is compared with other generation methods in terms of both image quality and the enhancement of palmprint recognition performance. Experiments show that PalmDiff performs well in image generation, with an FID score of 13.311 on MPD and 18.434 on Tongji. Besides, PalmDiff has significantly improved various backbones for palmprint recognition compared to other generation methods.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5228-5240"},"PeriodicalIF":13.7,"publicationDate":"2025-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144791949","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FusionINV: A Diffusion-Based Approach for Multimodal Image Fusion 一种基于扩散的多模态图像融合方法

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-05 DOI: 10.1109/TIP.2025.3593775

Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma

{"title":"FusionINV: A Diffusion-Based Approach for Multimodal Image Fusion","authors":"Pengwei Liang;Junjun Jiang;Qing Ma;Chenyang Wang;Xianming Liu;Jiayi Ma","doi":"10.1109/TIP.2025.3593775","DOIUrl":"10.1109/TIP.2025.3593775","url":null,"abstract":"Infrared images exhibit a significantly different appearance compared to visible counterparts. Existing infrared and visible image fusion (IVF) methods fuse features from both infrared and visible images, producing a new “image” appearance not inherently captured by any existing device. From an appearance perspective, infrared, visible, and fused images belong to different data domains. This difference makes it challenging to apply fused images because their domain-specific appearance may be difficult for downstream systems, e.g., pre-trained segmentation models. Therefore, accurately assessing the quality of the fused image is challenging. To address those problem, we propose a novel IVF method, FusionINV, which produces fused images with an appearance similar to visible images. FusionINV employs the pre-trained Stable Diffusion (SD) model to invert infrared images into the noise feature space. To inject visible-style appearance information into the infrared features, we leverage the inverted features from visible images to guide this inversion process. In this way, we can embed all the information of infrared and visible images in the noise feature space, and then use the prior of the pre-trained SD model to generate visually friendly images that align more closely with the RGB distribution. Specially, to generate the fused image, we design a tailored fusion rule within the denoising process that iteratively fuses visible-style infrared and visible features. In this way, the fused image falls into the visible domain and can be directly applied to existing downstream machine systems. Thanks to advancements in image inversion, FusionINV can directly produce fused images in a training-free manner. Extensive experiments demonstrate that FusionINV achieves outstanding performance in both human visual evaluation and machine perception tasks. The code is available at <uri>https://github.com/erfect2020/FusionINV</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5355-5368"},"PeriodicalIF":13.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

UFPF: A Universal Feature Perception Framework for Microscopic Hyperspectral Images UFPF：用于显微高光谱图像的通用特征感知框架

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-05 DOI: 10.1109/TIP.2025.3594151

Geng Qin;Huan Liu;Wei Li;Xueyu Zhang;Yuxing Guo;Xiang-Gen Xia

{"title":"UFPF: A Universal Feature Perception Framework for Microscopic Hyperspectral Images","authors":"Geng Qin;Huan Liu;Wei Li;Xueyu Zhang;Yuxing Guo;Xiang-Gen Xia","doi":"10.1109/TIP.2025.3594151","DOIUrl":"10.1109/TIP.2025.3594151","url":null,"abstract":"In recent years, deep learning has shown immense promise in advancing medical hyperspectral imaging diagnostics at the microscopic level. Despite this progress, most existing research models remain constrained to single-task or single-scene applications, lacking robust collaborative interpretation of microscopic hyperspectral features and spatial information, thereby failing to fully explore the clinical value of hyperspectral data. In this paper, we propose a microscopic hyperspectral universal feature perception framework (UFPF), which extracts high-quality spatial-spectral features of hyperspectral data, providing a robust feature foundation for downstream tasks. Specifically, this innovative framework captures different sequential spatial nearest-neighbor relationships through a hierarchical corner-to-center mamba structure. It incorporates the concept of “progressive focus towards the center”, starting by emphasizing edge information and gradually refining attention from the edges towards the center. This approach effectively integrates richer spatial-spectral information, boosting the model’s feature extraction capability. On this basis, a dual-path spatial-spectral joint perception module is developed to achieve the complementarity of spatial and spectral information and fully explore the potential patterns in the data. In addition, a Mamba-attention Mix-alignment is designed to enhance the optimized alignment of deep semantic features. The experimental results on multiple datasets have shown that this framework significantly improves classification and segmentation performance, supporting the clinical application of medical hyperspectral data. The code is available at: <uri>https://github.com/Qugeryolo/UFPF</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5513-5526"},"PeriodicalIF":13.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786589","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Hard EXIF: Protecting Image Authorship Through Metadata, Hardware, and Content 硬EXIF：通过元数据、硬件和内容保护图像的作者身份

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-05 DOI: 10.1109/TIP.2025.3593911

Yushu Zhang;Bowen Shi;Shuren Qi;Xiangli Xiao;Ping Wang;Wenying Wen

{"title":"Hard EXIF: Protecting Image Authorship Through Metadata, Hardware, and Content","authors":"Yushu Zhang;Bowen Shi;Shuren Qi;Xiangli Xiao;Ping Wang;Wenying Wen","doi":"10.1109/TIP.2025.3593911","DOIUrl":"10.1109/TIP.2025.3593911","url":null,"abstract":"With the rapid proliferation of digital image content and advancements in image editing technologies, the protection of digital image authorship has become an increasingly important issue. Traditional methods for authorship protection include registering authorship through certification organization, utilizing image metadata such as Exchangeable Image File Format (EXIF) data, and employing watermarking techniques to prove ownership. In recent years, blockchain-based technologies have also been introduced to enhance authorship protection further. However, these approaches face challenges in balancing four key attributes: strong legal validity, high security, low cost, and high usability. Authorship registration is often cumbersome, EXIF metadata can be easily extracted and tampered with, watermarking techniques are vulnerable to various forms of attack, and blockchain technology is complex to implement and requires long-term maintenance. In response to these challenges, this paper introduces a new framework Hard EXIF, designed to balance these multiple attributes while delivering improved performance. The proposed method integrates metadata with physically unclonable functions (PUFs) for the first time, creating unique device fingerprints and embedding them into images using watermarking techniques. By leveraging the security and simplicity of hash functions and PUFs, this method enhances EXIF security while minimizing costs. Experimental results demonstrate that the Hard EXIF framework achieves an average peak signal-to-noise ratio (PSNR) of 42.89 dB, with a similarity of 99.46% between the original and watermarked images, and the extraction error rate is only 0.0017. These results show that the Hard EXIF framework balances legal validity, security, cost, and usability, promising authorship protection with great potential for wider application.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5023-5037"},"PeriodicalIF":13.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786590","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Asymmetric and Discrete Self-Representation Enhancement Hashing for Cross-Domain Retrieval 跨域检索的非对称和离散自表示增强哈希

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-05 DOI: 10.1109/TIP.2025.3594140

Jiaxing Li;Lin Jiang;Xiaozhao Fang;Shengli Xie;Yong Xu

{"title":"Asymmetric and Discrete Self-Representation Enhancement Hashing for Cross-Domain Retrieval","authors":"Jiaxing Li;Lin Jiang;Xiaozhao Fang;Shengli Xie;Yong Xu","doi":"10.1109/TIP.2025.3594140","DOIUrl":"10.1109/TIP.2025.3594140","url":null,"abstract":"Due to the characteristics of low storage requirement and high retrieval efficiency, hashing-based retrieval has shown its great potential and has been widely applied for information retrieval. However, retrieval tasks in real-world applications are usually required to handle the data from various domains, leading to the unsatisfactory performances of existing hashing-based methods, as most of them assuming that the retrieval pool and the querying set are similar. Most of the existing works overlooked the self-representation that containing the modality-specific semantic information, in the cross-modal data. To cope with the challenges mentioned above, this paper proposes an asymmetric and discrete self-representation enhancement hashing (ADSEH) for cross-domain retrieval. Specifically, ADSEH aligns the mathematical distribution with domain adaptation for cross-domain data, by exploiting the correlation of minimizing the distribution mismatch to reduce the heterogeneous semantic gaps. Then, ADSEH learns the self-representation which is embedded into the generated hash codes, for enhancing the semantic relevance, improving the quality of hash codes, and boosting the generalization ability of ADSEH. Finally, the heterogeneous semantic gaps are further reduced by the log-likelihood similarity preserving for the cross-domain data. Experimental results demonstrate that ADSEH can outperform some SOTA baseline methods on four widely used datasets.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5158-5171"},"PeriodicalIF":13.7,"publicationDate":"2025-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786636","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Faces Blind Your Eyes: Unveiling the Content-Irrelevant Synthetic Artifacts for Deepfake Detection 脸蒙蔽了你的眼睛：揭示与内容无关的人工制品用于深度伪造检测。

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-04 DOI: 10.1109/TIP.2025.3592576

Xinghe Fu;Benzun Fu;Shen Chen;Taiping Yao;Yiting Wang;Shouhong Ding;Xiubo Liang;Xi Li

{"title":"Faces Blind Your Eyes: Unveiling the Content-Irrelevant Synthetic Artifacts for Deepfake Detection","authors":"Xinghe Fu;Benzun Fu;Shen Chen;Taiping Yao;Yiting Wang;Shouhong Ding;Xiubo Liang;Xi Li","doi":"10.1109/TIP.2025.3592576","DOIUrl":"10.1109/TIP.2025.3592576","url":null,"abstract":"Data synthesis methods have shown promising results in general deepfake detection tasks. This is attributed to the inherent blending process in deepfake creation, which leaves behind distinct synthetic artifacts. However, the existence of content-irrelevant artifacts has not been explicitly explored in the deepfake synthesis. Unveiling content-irrelevant synthetic artifacts helps uncover general deepfake features and enhances the generalization capability of detection models. To capture the content-irrelevant synthetic artifacts, we propose a learning framework incorporating a synthesis process for diverse contents and specially designed learning strategies that encourage using content-irrelevant forgery information across deepfake images. From the data perspective, we disentangle the blending operation from face data and propose a universal synthetic module that generates images from various classes with common synthetic artifacts. From the learning perspective, a domain-adaptive learning head is introduced to filter out forgery-irrelevant features and optimize the decision on deepfake face detection. To efficiently learn the content-irrelevant artifacts for detection with a large sampling space, we propose a batch-wise sample selection strategy that actively mines the hard samples based on their effect on the adaptive decision boundary. Extensive cross-dataset experiments show that our method achieves state-of-the-art performance in general deepfake detection.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5686-5696"},"PeriodicalIF":13.7,"publicationDate":"2025-08-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144777821","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Simplifying Scalable Subspace Clustering and Its Multi-View Extension by Anchor-to-Sample Kernel 基于锚点-样本核的可伸缩子空间聚类简化及其多视图扩展

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-01 DOI: 10.1109/TIP.2025.3593057

Zhoumin Lu;Feiping Nie;Linru Ma;Rong Wang;Xuelong Li

{"title":"Simplifying Scalable Subspace Clustering and Its Multi-View Extension by Anchor-to-Sample Kernel","authors":"Zhoumin Lu;Feiping Nie;Linru Ma;Rong Wang;Xuelong Li","doi":"10.1109/TIP.2025.3593057","DOIUrl":"10.1109/TIP.2025.3593057","url":null,"abstract":"As we all known, sparse subspace learning can provide good input for spectral clustering, thereby producing high-quality cluster partitioning. However, it employs complete samples as the dictionary for representation learning, resulting in non-negligible computational costs. Therefore, replacing the complete samples with representative ones (anchors) as the dictionary has become a more popular choice, giving rise to a series of related works. Unfortunately, although these works are linear with respect to the number of samples, they are often quadratic or even cubic with respect to the number of anchors. In this paper, we derive a simpler problem to replace the original scalable subspace clustering, whose properties are utilized. This new problem is linear with respect to both the number of samples and anchors, further enhancing scalability and providing more efficient operations. Furthermore, thanks to the new problem formulation, we can adopt a separate fusion strategy for multi-view extensions. This strategy can better measure the inter-view difference and avoid alternate optimization, so as to achieve more robust and efficient multi-view clustering. Finally, comprehensive experiments demonstrate that our methods not only significantly reduce time overhead but also exhibit superior performance.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"5084-5098"},"PeriodicalIF":13.7,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144763227","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Deep Ensemble Model for Quantitative Optical Property and Chromophore Concentration Images of Biological Tissues 生物组织定量光学性质和发色团浓度图像的深系综模型

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-08-01 DOI: 10.1109/TIP.2025.3593071

Bingbao Yan;Bowen Song;Chang Ge;Xinman Yin;Wenchao Jia;Gen Mu;Yanyu Zhao

{"title":"Deep Ensemble Model for Quantitative Optical Property and Chromophore Concentration Images of Biological Tissues","authors":"Bingbao Yan;Bowen Song;Chang Ge;Xinman Yin;Wenchao Jia;Gen Mu;Yanyu Zhao","doi":"10.1109/TIP.2025.3593071","DOIUrl":"10.1109/TIP.2025.3593071","url":null,"abstract":"The ability to quantify widefield tissue optical properties (OPs, i.e., absorption and scattering) has major implications on the characterization of various physiological and disease processes. However, conventional image processing methods for tissue optical properties are either limited to qualitative analysis, or have tradeoffs in speed and accuracy. The key to quantification of optical properties is the extraction of amplitude maps from reflectance images under sinusoidal illumination of different spatial frequencies. Conventional three-phase demodulation (TPD) method has been demonstrated for the mapping of OPs, but it requires as many as 14 measurement images for accurate OP extraction, which leads to limited throughput and hinders practical translation. Although single-phase demodulation (SPD) method has been proposed to map OPs with a single measurement image, it is typically subject to image artifacts and decreased measurement accuracy. To tackle those challenges, here we develop a deep ensemble model (DEM) that can map tissue optical properties with high accuracy in a single snapshot, increasing the measurement speed by <inline-formula> <tex-math>$14times $ </tex-math></inline-formula> compared to conventional TPD method. The proposed method was validated with measurements on an array of optical phantoms, ex vivo tissues, and in vivo tissues. The errors for OP extraction were <inline-formula> <tex-math>$0.83~pm ~5.0$ </tex-math></inline-formula>% for absorption and <inline-formula> <tex-math>$0.40~pm ~1.9$ </tex-math></inline-formula>% for reduced scattering, dramatically lower than that of the state-of-the-art SPD method (<inline-formula> <tex-math>$2.5~pm ~15$ </tex-math></inline-formula>% for absorption and -<inline-formula> <tex-math>$1.2~pm ~11$ </tex-math></inline-formula>% for reduced scattering). It was further demonstrated that while trained with data from a single wavelength, the DEM can be directly applied to other wavelengths and effectively obtain optical property and chromophore concentration images of biological tissues. Together, these results highlight the potential of DEM to enable new capabilities for quantitative monitoring of tissue physiological and disease processes.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4999-5008"},"PeriodicalIF":13.7,"publicationDate":"2025-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11107267","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144763226","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0