{"title":"Novel Computational Photography for Soft-Focus Effect in Automatic Post Production","authors":"Hao-Yu Tsai;Morris C.-H. Tsai;Scott C.-H. Huang;Hsiao-Chun Wu","doi":"10.1109/TIP.2025.3562071","DOIUrl":"10.1109/TIP.2025.3562071","url":null,"abstract":"The well-known soft-focus effect, which relies on either special optical filters or manual post-production techniques, has been intriguing and powerful in photography for quite a while. Nonetheless, how to impose the soft-focus effect automatically simply using sophisticated image-processing (computational photography) algorithms has never been addressed in the literature to the best of our knowledge. In this work, we would like to make the first-ever attempt to design an automatic, optical-filter-free approach to create the appropriate soft-focus effects desired by individual users. Our approach is first to investigate the physical optical filter, namely <italic>Kenko Black Mist No. 5</i>, and estimate the corresponding kernel matrix (i.e., the system impulse response matrix) using our proposed novel irradiance-domain kernel-matrix estimation framework. Furthermore, we demonstrate that it is not feasible to find a kernel matrix that precisely characterizes the soft-focus effect by just using a pixel-value-domain image (a regular photo) in post production. To combat the aforementioned problem, we establish a novel pixel-value-to-pseudo-irradiance map such that the pseudo irradiance-domain image can be obtained directly from any pixel-value-domain image. Finally the soft-focus effect can be created from the two-dimensional convolution between the pseudo irradiance-domain image and the estimated kernel. To evaluate our proposed automatic scheme for soft-focus effect, we compare the results from our proposed new scheme and the physical optical filter in terms of the DCT-KLD (Kullback-Leibler divergence of discrete cosine transform) and the conventional PSNR (peak-signal-to-noise ratio). Experiments show that our proposed new scheme can achieve very small DCT-KLDs and very large PSNRs over the ground truth, namely the results from the physical optical filter.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2560-2574"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143866985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junfeng Tu;Xueliang Liu;Zhen Huang;Yanbin Hao;Richang Hong;Meng Wang
{"title":"Cross-Modal Hashing via Diverse Instances Matching","authors":"Junfeng Tu;Xueliang Liu;Zhen Huang;Yanbin Hao;Richang Hong;Meng Wang","doi":"10.1109/TIP.2025.3561659","DOIUrl":"10.1109/TIP.2025.3561659","url":null,"abstract":"Cross-modal hashing is a highly effective technique for searching relevant data across different modalities, owing to its low storage costs and fast similarity retrieval capability. While significant progress has been achieved in this area, prior investigations predominantly concentrate on a one-to-one feature alignment approach, where a singular feature is derived for similarity retrieval. However, the singular feature in these methods fails to adequately capture the varied multi-instance information inherent in the original data across disparate modalities. Consequently, the conventional one-to-one methodology is plagued by a semantic mismatch issue, as the rigid one-to-one alignment inhibits effective multi-instance matching. To address this issue, we propose a novel Diverse Instances Matching for Cross-modal Hashing (DIMCH), which explores the relevance between multiple instances in different modalities using a multi-instance learning algorithm. Specifically, we design a novel diverse instances learning module to extract a multi-feature set, which enables our model to capture detailed multi-instance semantics. To evaluate the similarity between two multi-feature sets, we adopt the smooth chamfer distance function, which enables our model to incorporate the conventional similarity retrieval structure. Moreover, to sufficiently exploit the supervised information from the semantic label, we adopt the weight cosine triplet loss as the objective function, which incorporates the multilevel similarity among the multi-labels into the training procedure and enables the model to mine the multi-label correlation effectively. Extensive experiments demonstrate that our diverse hashing embedding method achieves state-of-the-art performance in supervised cross-modal hashing retrieval tasks.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2737-2749"},"PeriodicalIF":0.0,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143866835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamical Threshold-Based Fractional Anisotropic Diffusion for Speckle Noise Removal","authors":"Jiali Wei;Xiaofeng Liao","doi":"10.1109/TIP.2025.3561685","DOIUrl":"10.1109/TIP.2025.3561685","url":null,"abstract":"The study of effective methods for removing image speckle remains a significant challenge in image processing. In contrast to additive noise, speckle noise is a multiplicative noise whose intensity is proportional to the signal. This results in a noise distribution that exhibits a high dependence on the signal intensity throughout the image, rendering it difficult to remove. Therefore, we present a novel approach to speckle noise removal using dynamical threshold–based fractional anisotropic diffusion (named as DTFAD) in this study. The method simultaneously considers both gradient and gray scale information in the image. In addition, the fractional derivative is integrated with anisotropic diffusion in the DTFAD model, which enhances the image denoising effect to preserve the fundamental features and edges of the image. The design of a dynamic threshold function in the diffusion coefficient enables the diffusion pattern and intensity to adaptively change according to image information, thus effectively removing speckle noise. We establish the well–posedness of the DTFAD model and implement it using an explicit finite difference scheme. Extensive experiments demonstrate that the DTFAD model outperforms traditional anisotropic diffusion techniques, and achieves a superior balance between denoising performance and texture preservation. This evidence demonstrates that the DTFAD model has the potential to be applied in practical engineering.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2826-2839"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862084","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Source-Free Cross-Modal Knowledge Transfer by Unleashing the Potential of Task-Irrelevant Data","authors":"Jinjing Zhu;Yucheng Chen;Lin Wang","doi":"10.1109/TIP.2025.3561670","DOIUrl":"10.1109/TIP.2025.3561670","url":null,"abstract":"Source-free cross-modal knowledge transfer is a crucial yet challenging task, which aims to transfer knowledge from one source modality (e.g., RGB) to the target modality (e.g., depth or infrared) with no access to the task-relevant (TR) source data due to memory and privacy concerns. A recent attempt leverages the paired task-irrelevant (TI) data and directly matches the features from them to eliminate the modality gap. However, it ignores a pivotal clue that the paired TI data could be utilized to effectively estimate the source data distribution and better facilitate knowledge transfer to the target modality. To this end, we propose a novel yet concise framework to unlock the potential of paired TI data for enhancing source-free cross-modal knowledge transfer. Our work is buttressed by two key technical components. Firstly, to better estimate the source data distribution, we introduce a Task-irrelevant data-Guided Modality Bridging (TGMB) module. It translates the target modality data into the source-like images based on paired TI data and the guidance of the available source model to alleviate two key gaps: 1) inter-modality gap between the paired TI data; 2) intra-modality gap between TI and TR target data. We then propose a Task-irrelevant data-Guided Knowledge Transfer (TGKT) module that transfers knowledge from the source model to the target model by leveraging the paired TI data. Notably, due to the unavailability of labels for the TR target data and its less reliable prediction from the source model, our TGKT model incorporates a self-supervised pseudo-labeling approach to enable the target model to learn from its predictions. Extensive experiments show that our method achieves state-of-the-art performance on three datasets (RGB-to-depth and RGB-to-infrared).","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2840-2852"},"PeriodicalIF":0.0,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143862081","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhuohang Dang;Minnan Luo;Jihong Wang;Chengyou Jia;Haochen Han;Herun Wan;Guang Dai;Xiaojun Chang;Jingdong Wang
{"title":"Disentangled Noisy Correspondence Learning","authors":"Zhuohang Dang;Minnan Luo;Jihong Wang;Chengyou Jia;Haochen Han;Herun Wan;Guang Dai;Xiaojun Chang;Jingdong Wang","doi":"10.1109/TIP.2025.3559457","DOIUrl":"10.1109/TIP.2025.3559457","url":null,"abstract":"Cross-modal retrieval is crucial in understanding latent correspondences across modalities. However, existing methods implicitly assume well-matched training data, which is impractical as real-world data inevitably involves imperfect alignments, i.e., noisy correspondences. Although some works explore similarity-based strategies to address such noise, they suffer from sub-optimal similarity predictions influenced by modality-exclusive information (MEI), e.g., background noise in images and abstract definitions in texts. This issue arises as MEI is not shared across modalities, thus aligning it in training can markedly mislead similarity predictions. Moreover, although intuitive, directly applying previous cross-modal disentanglement methods suffers from limited noise tolerance and disentanglement efficacy. Inspired by the robustness of information bottlenecks against noise, we introduce DisNCL, a novel information-theoretic framework for feature Disentanglement in Noisy Correspondence Learning, to adaptively balance the extraction of modality-invariant information (MII) and MEI with certifiable optimal cross-modal disentanglement efficacy. DisNCL then enhances similarity predictions in modality-invariant subspace, thereby greatly boosting similarity-based alleviation strategy for noisy correspondences. Furthermore, DisNCL introduces soft matching targets to model noisy many-to-many relationships inherent in multi-modal inputs for noise-robust and accurate cross-modal alignment. Extensive experiments confirm DisNCL’s efficacy by 2% average recall improvement. Mutual information estimation and visualization results show that DisNCL learns meaningful MII/MEI subspaces, validating our theoretical analyses.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2602-2615"},"PeriodicalIF":0.0,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143858029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Unsupervised Range-Nullspace Learning Prior for Multispectral Images Reconstruction","authors":"Yurong Chen;Yaonan Wang;Hui Zhang","doi":"10.1109/TIP.2025.3560430","DOIUrl":"10.1109/TIP.2025.3560430","url":null,"abstract":"Snapshot Spectral Imaging (SSI) techniques, with the ability to capture both spectral and spatial information in a single exposure, have been found useful in a wide range of applications. SSI systems generally operate within the ‘encoding-decoding’ framework, leveraging the synergism of optical hardware and reconstruction algorithms. Typically, reconstructing desired spectral images from SSI measurements is an ill-posed and challenging problem. Existing studies utilize either model-based or deep learning-based methods, but both have their drawbacks. Model-based algorithms suffer from high computational costs, while supervised learning-based methods rely on large paired training data. In this paper, we propose a novel Unsupervised range-Nullspace learning (UnNull) prior for spectral image reconstruction. UnNull explicitly models the data via subspace decomposition, offering enhanced interpretability and generalization ability. Specifically, UnNull considers that the spectral images can be decomposed into the range and null subspaces. The features projected onto the range subspace are mainly low-frequency information, while features in the nullspace represent high-frequency information. Comprehensive multispectral demosaicing and reconstruction experiments demonstrate the superior performance of our proposed algorithm.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2513-2528"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Binghui Zuo;Wenqian Sun;Zimeng Zhao;Xiaohan Yuan;Yangang Wang
{"title":"NP-Hand: Novel Perspective Hand Image Synthesis Guided by Normals","authors":"Binghui Zuo;Wenqian Sun;Zimeng Zhao;Xiaohan Yuan;Yangang Wang","doi":"10.1109/TIP.2025.3560241","DOIUrl":"10.1109/TIP.2025.3560241","url":null,"abstract":"Synthesizing multi-view images that are geometrically consistent with a given single-view image is one of the hot issues in AIGC in recent years. Existing methods have achieved impressive performance on objects with symmetry or rigidity, but they are inappropriate for the human hand. Because an image-captured human hand has more diverse poses and less attractive textures. In this paper, we propose NP-Hand, a framework that elegantly combines the diffusion model and generative adversarial network: The multi-step diffusion is trained to synthesize low-resolution novel perspective, while the single-step generator is exploited to further enhance synthesis quality. To maintain the consistency between inputs and synthesis, we creatively introduce normal maps into NP-Hand to guide the whole synthesizing process. Comprehensive evaluations have demonstrated that the proposed framework is superior to existing state-of-the-art models and more suitable for synthesizing hand images with faithful structures and realistic appearance details. The code will be released on our website.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2435-2449"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lightweight RGB-D Salient Object Detection From a Speed-Accuracy Tradeoff Perspective","authors":"Songsong Duan;Xi Yang;Nannan Wang;Xinbo Gao","doi":"10.1109/TIP.2025.3560488","DOIUrl":"10.1109/TIP.2025.3560488","url":null,"abstract":"Current RGB-D methods usually leverage large-scale backbones to improve accuracy but sacrifice efficiency. Meanwhile, several existing lightweight methods are difficult to achieve high-precision performance. To balance the efficiency and performance, we propose a Speed-Accuracy Tradeoff Network (SATNet) for Lightweight RGB-D SOD from three fundamental perspectives: depth quality, modality fusion, and feature representation. Concerning depth quality, we introduce the Depth Anything Model to generate high-quality depth maps,which effectively alleviates the multi-modal gaps in the current datasets. For modality fusion, we propose a Decoupled Attention Module (DAM) to explore the consistency within and between modalities. Here, the multi-modal features are decoupled into dual-view feature vectors to project discriminable information of feature maps. For feature representation, we develop a Dual Information Representation Module (DIRM) with a bi-directional inverted framework to enlarge the limited feature space generated by the lightweight backbones. DIRM models texture features and saliency features to enrich feature space, and employ two-way prediction heads to optimal its parameters through a bi-directional backpropagation. Finally, we design a Dual Feature Aggregation Module (DFAM) in the decoder to aggregate texture and saliency features. Extensive experiments on five public RGB-D SOD datasets indicate that the proposed SATNet excels state-of-the-art (SOTA) CNN-based heavyweight models and achieves a lightweight framework with 5.2 M parameters and 415 FPS. The code is available at <uri>https://github.com/duan-song/SATNet</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2529-2543"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PreCM: The Padding-Based Rotation Equivariant Convolution Mode for Semantic Segmentation","authors":"Xinyu Xu;Huazhen Liu;Tao Zhang;Huilin Xiong;Wenxian Yu","doi":"10.1109/TIP.2025.3558425","DOIUrl":"10.1109/TIP.2025.3558425","url":null,"abstract":"Semantic segmentation is an important branch of image processing and computer vision. With the popularity of deep learning, various convolutional neural networks have been proposed for pixel-level classification and segmentation tasks. In practical scenarios, however, imaging angles are often arbitrary, encompassing instances such as water body images from remote sensing and capillary and polyp images in the medical domain, where prior orientation information is typically unavailable to guide these networks to extract more effective features. In this case, learning features from objects with diverse orientation information poses a significant challenge, as the majority of CNN-based semantic segmentation networks lack rotation equivariance to resist the disturbance from orientation information. To address this challenge, this paper first constructs a universal convolution-group framework aimed at more fully utilizing orientation information and equipping the network with rotation equivariance. Subsequently, we mathematically design a padding-based rotation equivariant convolution mode (PreCM), which is not only applicable to multi-scale images and convolutional kernels but can also serve as a replacement component for various types of convolutions, such as dilated convolutions, transposed convolutions, and asymmetric convolution. To quantitatively assess the impact of image rotation in semantic segmentation tasks, we also propose a new evaluation metric, Rotation Difference (RD). The replacement experiments related to six existing semantic segmentation networks on three datasets (i.e., Satellite Images of Water Bodies, DRIVE, and Floodnet) show that, the average Intersection Over Union (IOU) of their PreCM-based versions respectively improve 6.91%, 10.63%, 4.53%, 5.93%, 7.48%, 8.33% compared to their original versions in terms of random angle rotation. And the average RD values are decreased by 3.58%, 4.56%, 3.47%, 3.66%, 3.47%, 3.43% respectively. The code can be download from <uri>https://github.com/XinyuXu414</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2781-2795"},"PeriodicalIF":0.0,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143849686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chenxi Huang;Yuenan Hou;Weicai Ye;Di Huang;Xiaoshui Huang;Binbin Lin;Deng Cai
{"title":"NeRF-Det++: Incorporating Semantic Cues and Perspective-Aware Depth Supervision for Indoor Multi-View 3D Detection","authors":"Chenxi Huang;Yuenan Hou;Weicai Ye;Di Huang;Xiaoshui Huang;Binbin Lin;Deng Cai","doi":"10.1109/TIP.2025.3560240","DOIUrl":"10.1109/TIP.2025.3560240","url":null,"abstract":"NeRF-Det has achieved impressive performance in indoor multi-view 3D detection by innovatively utilizing NeRF to enhance representation learning. Despite its notable performance, we uncover three decisive shortcomings in its current design, including semantic ambiguity, inappropriate sampling, and insufficient utilization of depth supervision. To combat the aforementioned problems, we present three corresponding solutions: 1) Semantic Enhancement. We project the freely available 3D segmentation annotations onto the 2D plane and leverage the corresponding 2D semantic maps as the supervision signal, significantly enhancing the semantic awareness of multi-view detectors. 2) Perspective-Aware Sampling. Instead of employing the uniform sampling strategy, we put forward the perspective-aware sampling policy that samples densely near the camera while sparsely in the distance, more effectively collecting the valuable geometric clues. 3) Ordinal Residual Depth Supervision. As opposed to directly regressing the depth values that are difficult to optimize, we divide the depth range of each scene into a fixed number of ordinal bins and reformulate the depth prediction as the combination of the classification of depth bins as well as the regression of the residual depth values, thereby benefiting the depth learning process. The resulting algorithm, NeRF-Det++, has exhibited appealing performance in the ScanNetV2 and ARKITScenes datasets. Notably, in ScanNetV2, NeRF-Det++ outperforms the competitive NeRF-Det by +1.9% in mAP<inline-formula> <tex-math>$text{@}0.25$ </tex-math></inline-formula> and +3.5% in mAP<inline-formula> <tex-math>$text{@}0.50$ </tex-math></inline-formula>. The code will be publicly available at <uri>https://github.com/mrsempress/NeRF-Detplusplus</uri>","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"2575-2587"},"PeriodicalIF":0.0,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143847260","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}