IEEE transactions on image processing : a publication of the IEEE Signal Processing Society最新文献_第6页

Cross-Modal Hashing via Diverse Instances Matching 通过多样化实例匹配实现跨模态哈希算法

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-23 DOI: 10.1109/TIP.2025.3561659

Junfeng Tu;Xueliang Liu;Zhen Huang;Yanbin Hao;Richang Hong;Meng Wang

{"title":"Cross-Modal Hashing via Diverse Instances Matching","authors":"Junfeng Tu;Xueliang Liu;Zhen Huang;Yanbin Hao;Richang Hong;Meng Wang","doi":"10.1109/TIP.2025.3561659","DOIUrl":"10.1109/TIP.2025.3561659","url":null,"abstract":"Cross-modal hashing is a highly effective technique for searching relevant data across different modalities, owing to its low storage costs and fast similarity retrieval capability. While significant progress has been achieved in this area, prior investigations predominantly concentrate on a one-to-one feature alignment approach, where a singular feature is derived for similarity retrieval. However, the singular feature in these methods fails to adequately capture the varied multi-instance information inherent in the original data across disparate modalities. Consequently, the conventional one-to-one methodology is plagued by a semantic mismatch issue, as the rigid one-to-one alignment inhibits effective multi-instance matching. To address this issue, we propose a novel Diverse Instances Matching for Cross-modal Hashing (DIMCH), which explores the relevance between multiple instances in different modalities using a multi-instance learning algorithm. Specifically, we design a novel diverse instances learning module to extract a multi-feature set, which enables our model to capture detailed multi-instance semantics. To evaluate the similarity between two multi-feature sets, we adopt the smooth chamfer distance function, which enables our model to incorporate the conventional similarity retrieval structure. Moreover, to sufficiently exploit the supervised information from the semantic label, we adopt the weight cosine triplet loss as the objective function, which incorporates the multilevel similarity among the multi-labels into the training procedure and enables the model to mine the multi-label correlation effectively. Extensive experiments demonstrate that our diverse hashing embedding method achieves state-of-the-art performance in supervised cross-modal hashing retrieval tasks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2737-2749"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143866835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Dynamical Threshold-Based Fractional Anisotropic Diffusion for Speckle Noise Removal 基于动态阈值的分数各向异性扩散散斑噪声去除

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-22 DOI: 10.1109/TIP.2025.3561685

Jiali Wei;Xiaofeng Liao

{"title":"Dynamical Threshold-Based Fractional Anisotropic Diffusion for Speckle Noise Removal","authors":"Jiali Wei;Xiaofeng Liao","doi":"10.1109/TIP.2025.3561685","DOIUrl":"10.1109/TIP.2025.3561685","url":null,"abstract":"The study of effective methods for removing image speckle remains a significant challenge in image processing. In contrast to additive noise, speckle noise is a multiplicative noise whose intensity is proportional to the signal. This results in a noise distribution that exhibits a high dependence on the signal intensity throughout the image, rendering it difficult to remove. Therefore, we present a novel approach to speckle noise removal using dynamical threshold–based fractional anisotropic diffusion (named as DTFAD) in this study. The method simultaneously considers both gradient and gray scale information in the image. In addition, the fractional derivative is integrated with anisotropic diffusion in the DTFAD model, which enhances the image denoising effect to preserve the fundamental features and edges of the image. The design of a dynamic threshold function in the diffusion coefficient enables the diffusion pattern and intensity to adaptively change according to image information, thus effectively removing speckle noise. We establish the well–posedness of the DTFAD model and implement it using an explicit finite difference scheme. Extensive experiments demonstrate that the DTFAD model outperforms traditional anisotropic diffusion techniques, and achieves a superior balance between denoising performance and texture preservation. This evidence demonstrates that the DTFAD model has the potential to be applied in practical engineering.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2826-2839"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data 通过释放任务相关数据的潜力实现无源跨模式知识转移

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-22 DOI: 10.1109/TIP.2025.3561670

Jinjing Zhu;Yucheng Chen;Lin Wang

{"title":"Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data","authors":"Jinjing Zhu;Yucheng Chen;Lin Wang","doi":"10.1109/TIP.2025.3561670","DOIUrl":"10.1109/TIP.2025.3561670","url":null,"abstract":"Source-free cross-modal knowledge transfer is a crucial yet challenging task, which aims to transfer knowledge from one source modality (e.g., RGB) to the target modality (e.g., depth or infrared) with no access to the task-relevant (TR) source data due to memory and privacy concerns. A recent attempt leverages the paired task-irrelevant (TI) data and directly matches the features from them to eliminate the modality gap. However, it ignores a pivotal clue that the paired TI data could be utilized to effectively estimate the source data distribution and better facilitate knowledge transfer to the target modality. To this end, we propose a novel yet concise framework to unlock the potential of paired TI data for enhancing source-free cross-modal knowledge transfer. Our work is buttressed by two key technical components. Firstly, to better estimate the source data distribution, we introduce a Task-irrelevant data-Guided Modality Bridging (TGMB) module. It translates the target modality data into the source-like images based on paired TI data and the guidance of the available source model to alleviate two key gaps: 1) inter-modality gap between the paired TI data; 2) intra-modality gap between TI and TR target data. We then propose a Task-irrelevant data-Guided Knowledge Transfer (TGKT) module that transfers knowledge from the source model to the target model by leveraging the paired TI data. Notably, due to the unavailability of labels for the TR target data and its less reliable prediction from the source model, our TGKT model incorporates a self-supervised pseudo-labeling approach to enable the target model to learn from its predictions. Extensive experiments show that our method achieves state-of-the-art performance on three datasets (RGB-to-depth and RGB-to-infrared).","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2840-2852"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Disentangled Noisy Correspondence Learning 解纠缠的噪声对应学习

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-21 DOI: 10.1109/TIP.2025.3559457

Zhuohang Dang;Minnan Luo;Jihong Wang;Chengyou Jia;Haochen Han;Herun Wan;Guang Dai;Xiaojun Chang;Jingdong Wang

{"title":"Disentangled Noisy Correspondence Learning","authors":"Zhuohang Dang;Minnan Luo;Jihong Wang;Chengyou Jia;Haochen Han;Herun Wan;Guang Dai;Xiaojun Chang;Jingdong Wang","doi":"10.1109/TIP.2025.3559457","DOIUrl":"10.1109/TIP.2025.3559457","url":null,"abstract":"Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predictions influenced by modality-exclusive information (MEI), e.g., background noise in images and abstract definitions in texts. This issue arises as MEI is not shared across modalities, thus aligning it in training can markedly mislead similarity predictions. Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy. Inspired by the robustness of information bottlenecks against noise, we introduce DisNCL, a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning, to adaptively balance the extraction of modality-invariant information (MII) and MEI with certifiable optimal cross-modal disentanglement efficacy. DisNCL then enhances similarity predictions in modality-invariant subspace, thereby greatly boosting similarity-based alleviation strategy for noisy correspondences. Furthermore, DisNCL introduces soft matching targets to model noisy many-to-many relationships inherent in multi-modal inputs for noise-robust and accurate cross-modal alignment. Extensive experiments confirm DisNCL’s efficacy by 2% average recall improvement. Mutual information estimation and visualization results show that DisNCL learns meaningful MII/MEI subspaces, validating our theoretical analyses.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2602-2615"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unsupervised Range-Nullspace Learning Prior for Multispectral Images Reconstruction 多光谱图像重建的无监督距离-零空间先验学习

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-18 DOI: 10.1109/TIP.2025.3560430

Yurong Chen;Yaonan Wang;Hui Zhang

{"title":"Unsupervised Range-Nullspace Learning Prior for Multispectral Images Reconstruction","authors":"Yurong Chen;Yaonan Wang;Hui Zhang","doi":"10.1109/TIP.2025.3560430","DOIUrl":"10.1109/TIP.2025.3560430","url":null,"abstract":"Snapshot Spectral Imaging (SSI) techniques, with the ability to capture both spectral and spatial information in a single exposure, have been found useful in a wide range of applications. SSI systems generally operate within the ‘encoding-decoding’ framework, leveraging the synergism of optical hardware and reconstruction algorithms. Typically, reconstructing desired spectral images from SSI measurements is an ill-posed and challenging problem. Existing studies utilize either model-based or deep learning-based methods, but both have their drawbacks. Model-based algorithms suffer from high computational costs, while supervised learning-based methods rely on large paired training data. In this paper, we propose a novel Unsupervised range-Nullspace learning (UnNull) prior for spectral image reconstruction. UnNull explicitly models the data via subspace decomposition, offering enhanced interpretability and generalization ability. Specifically, UnNull considers that the spectral images can be decomposed into the range and null subspaces. The features projected onto the range subspace are mainly low-frequency information, while features in the nullspace represent high-frequency information. Comprehensive multispectral demosaicing and reconstruction experiments demonstrate the superior performance of our proposed algorithm.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2513-2528"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NP-Hand: Novel Perspective Hand Image Synthesis Guided by Normals NP-Hand：由法线引导的新颖视角手图像合成

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-18 DOI: 10.1109/TIP.2025.3560241

Binghui Zuo;Wenqian Sun;Zimeng Zhao;Xiaohan Yuan;Yangang Wang

引用次数: 0

Lightweight RGB-D Salient Object Detection From a Speed-Accuracy Tradeoff Perspective 从速度-精度权衡的角度看轻量级RGB-D显著目标检测

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-18 DOI: 10.1109/TIP.2025.3560488

Songsong Duan;Xi Yang;Nannan Wang;Xinbo Gao

{"title":"Lightweight RGB-D Salient Object Detection From a Speed-Accuracy Tradeoff Perspective","authors":"Songsong Duan;Xi Yang;Nannan Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3560488","DOIUrl":"10.1109/TIP.2025.3560488","url":null,"abstract":"Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency. Meanwhile, several existing lightweight methods are difficult to achieve high-precision performance. To balance the efficiency and performance, we propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives: depth quality, modality fusion, and feature representation. Concerning depth quality, we introduce the Depth Anything Model to generate high-quality depth maps,which effectively alleviates the multi-modal gaps in the current datasets. For modality fusion, we propose a Decoupled Attention Module (DAM) to explore the consistency within and between modalities. Here, the multi-modal features are decoupled into dual-view feature vectors to project discriminable information of feature maps. For feature representation, we develop a Dual Information Representation Module (DIRM) with a bi-directional inverted framework to enlarge the limited feature space generated by the lightweight backbones. DIRM models texture features and saliency features to enrich feature space, and employ two-way prediction heads to optimal its parameters through a bi-directional backpropagation. Finally, we design a Dual Feature Aggregation Module (DFAM) in the decoder to aggregate texture and saliency features. Extensive experiments on five public RGB-D SOD datasets indicate that the proposed SATNet excels state-of-the-art (SOTA) CNN-based heavyweight models and achieves a lightweight framework with 5.2 M parameters and 415 FPS. The code is available at <uri>https://github.com/duan-song/SATNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2529-2543"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

PreCM: The Padding-Based Rotation Equivariant Convolution Mode for Semantic Segmentation 基于填充的旋转等变卷积语义分割模型

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-18 DOI: 10.1109/TIP.2025.3558425

Xinyu Xu;Huazhen Liu;Tao Zhang;Huilin Xiong;Wenxian Yu

{"title":"PreCM: The Padding-Based Rotation Equivariant Convolution Mode for Semantic Segmentation","authors":"Xinyu Xu;Huazhen Liu;Tao Zhang;Huilin Xiong;Wenxian Yu","doi":"10.1109/TIP.2025.3558425","DOIUrl":"10.1109/TIP.2025.3558425","url":null,"abstract":"Semantic segmentation is an important branch of image processing and computer vision. With the popularity of deep learning, various convolutional neural networks have been proposed for pixel-level classification and segmentation tasks. In practical scenarios, however, imaging angles are often arbitrary, encompassing instances such as water body images from remote sensing and capillary and polyp images in the medical domain, where prior orientation information is typically unavailable to guide these networks to extract more effective features. In this case, learning features from objects with diverse orientation information poses a significant challenge, as the majority of CNN-based semantic segmentation networks lack rotation equivariance to resist the disturbance from orientation information. To address this challenge, this paper first constructs a universal convolution-group framework aimed at more fully utilizing orientation information and equipping the network with rotation equivariance. Subsequently, we mathematically design a padding-based rotation equivariant convolution mode (PreCM), which is not only applicable to multi-scale images and convolutional kernels but can also serve as a replacement component for various types of convolutions, such as dilated convolutions, transposed convolutions, and asymmetric convolution. To quantitatively assess the impact of image rotation in semantic segmentation tasks, we also propose a new evaluation metric, Rotation Difference (RD). The replacement experiments related to six existing semantic segmentation networks on three datasets (i.e., Satellite Images of Water Bodies, DRIVE, and Floodnet) show that, the average Intersection Over Union (IOU) of their PreCM-based versions respectively improve 6.91%, 10.63%, 4.53%, 5.93%, 7.48%, 8.33% compared to their original versions in terms of random angle rotation. And the average RD values are decreased by 3.58%, 4.56%, 3.47%, 3.66%, 3.47%, 3.43% respectively. The code can be download from <uri>https://github.com/XinyuXu414</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2781-2795"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NeRF-Det++: Incorporating Semantic Cues and Perspective-Aware Depth Supervision for Indoor Multi-View 3D Detection nerf - de++：结合语义线索和视角感知深度监督的室内多视角3D检测

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-17 DOI: 10.1109/TIP.2025.3560240

Chenxi Huang;Yuenan Hou;Weicai Ye;Di Huang;Xiaoshui Huang;Binbin Lin;Deng Cai

{"title":"NeRF-Det++: Incorporating Semantic Cues and Perspective-Aware Depth Supervision for Indoor Multi-View 3D Detection","authors":"Chenxi Huang;Yuenan Hou;Weicai Ye;Di Huang;Xiaoshui Huang;Binbin Lin;Deng Cai","doi":"10.1109/TIP.2025.3560240","DOIUrl":"10.1109/TIP.2025.3560240","url":null,"abstract":"NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by innovatively utilizing NeRF to enhance representation learning. Despite its notable performance, we uncover three decisive shortcomings in its current design, including semantic ambiguity, inappropriate sampling, and insufficient utilization of depth supervision. To combat the aforementioned problems, we present three corresponding solutions: 1) Semantic Enhancement. We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors. 2) Perspective-Aware Sampling. Instead of employing the uniform sampling strategy, we put forward the perspective-aware sampling policy that samples densely near the camera while sparsely in the distance, more effectively collecting the valuable geometric clues. 3) Ordinal Residual Depth Supervision. As opposed to directly regressing the depth values that are difficult to optimize, we divide the depth range of each scene into a fixed number of ordinal bins and reformulate the depth prediction as the combination of the classification of depth bins as well as the regression of the residual depth values, thereby benefiting the depth learning process. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and ARKITScenes datasets. Notably, in ScanNetV2, NeRF-Det++ outperforms the competitive NeRF-Det by +1.9% in mAP<inline-formula> <tex-math>$text{@}0.25$ </tex-math></inline-formula> and +3.5% in mAP<inline-formula> <tex-math>$text{@}0.50$ </tex-math></inline-formula>. The code will be publicly available at <uri>https://github.com/mrsempress/NeRF-Detplusplus</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2575-2587"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143847260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Eigenpose: Occlusion-Robust 3D Human Mesh Reconstruction 特征：遮挡鲁棒三维人体网格重建

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2025-04-16 DOI: 10.1109/TIP.2025.3559788

Mi-Gyeong Gwon;Gi-Mun Um;Won-Sik Cheong;Wonjun Kim

{"title":"Eigenpose: Occlusion-Robust 3D Human Mesh Reconstruction","authors":"Mi-Gyeong Gwon;Gi-Mun Um;Won-Sik Cheong;Wonjun Kim","doi":"10.1109/TIP.2025.3559788","DOIUrl":"10.1109/TIP.2025.3559788","url":null,"abstract":"A new approach for occlusion-robust 3D human mesh reconstruction from a single image is introduced in this paper. Since occlusion has emerged as a major problem to be resolved in this field, there have been meaningful efforts to deal with various types of occlusions (e.g., person-to-person occlusion, person-to-object occlusion, self-occlusion, etc.). Although many recent studies have shown the remarkable progress, previous regression-based methods still have respective limitations to handle occlusion problems due to the lack of the appearance information. To address this problem, we propose a novel method for human mesh reconstruction based on the pose-relevant subspace analysis. Specifically, we first generate a set of eigenvectors, so-called eigenposes, by conducting the singular value decomposition (SVD) of the pose matrix, which contains diverse poses sampled from the training set. These eigenposes are then linearly combined to construct a target body pose according to fusing coefficients, which are learned through the proposed network. Such combination of principal body postures (i.e., eigenposes) in a global manner gives a great help to cope with partial ambiguities by occlusions. Furthermore, we also propose to exploit a joint injection module that efficiently incorporates the spatial information of visible joints into the encoded feature during the estimation process of fusing coefficients. Experimental results on benchmark datasets demonstrate the ability of the proposed method to robustly reconstruct the human mesh under various occlusions occurring in real-world scenarios. The code and model are publicly available at: <monospace><uri>https://github.com/DCVL-3D/Eigenpose_release</uri></monospace>.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2379-2391"},"PeriodicalIF":0.0,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143841834","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0