Thuy Thi Pham , Truong Thanh Nhat Mai , Hansung Yu , Chul Lee
{"title":"Dual-channel prior-based deep unfolding with contrastive learning for underwater image enhancement","authors":"Thuy Thi Pham , Truong Thanh Nhat Mai , Hansung Yu , Chul Lee","doi":"10.1016/j.jvcir.2025.104500","DOIUrl":"10.1016/j.jvcir.2025.104500","url":null,"abstract":"<div><div>Underwater image enhancement (UIE) techniques aim to improve the visual quality of underwater images degraded by wavelength-dependent light absorption and scattering. In this work, we propose a deep unfolding approach for UIE to leverage the advantages of both model- and learning-based approaches while overcoming their weaknesses. Specifically, we first formulate the UIE task as a joint optimization problem with physics-based priors, providing a robust theoretical foundation on the properties of underwater imaging. Then, we define implicit regularizers to compensate for modeling inaccuracies in the physics-based priors and solve the optimization using an iterative technique. Finally, we unfold the iterative algorithm into a series of interconnected blocks, where each block represents a single iteration of the algorithm. We further improve performance by employing a contrastive learning strategy that learns discriminative representations between the underwater and clean images. Experimental results demonstrate that the proposed algorithm provides better enhancement performance than state-of-the-art algorithms.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104500"},"PeriodicalIF":2.6,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144254274","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TapFace: A task-oriented facial privacy protection framework","authors":"Zhenni Liu , Yu Zhou , Ping Xiong , Qian Wang","doi":"10.1016/j.jvcir.2025.104497","DOIUrl":"10.1016/j.jvcir.2025.104497","url":null,"abstract":"<div><div>Deep learning has been widely employed in various face recognition and analysis tasks, highlighting the importance of facial privacy protection. Numerous facial de-identification methods have been proposed to maximize image utility and prevent disclosing private information. However, existing methods encounter various challenges due to the diversity in the definitions of facial privacy. Thus, these methods fail to adaptively cater to varying facial privacy protection requirements. Therefore, this paper introduces TapFace, a task-oriented facial privacy protection framework, that enables users to tailor task, privacy, and background attributes according to specific task demands. Specifically, the TapFace framework processes original images through image-guided generation and privacy attribute randomization, ensuring the preservation of task-relevant features while effectively anonymizing private information. The experimental results from multiple real-world datasets indicate that the proposed framework can adaptively protect facial privacy while fulfilling the images’ usability requirements during specific tasks.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104497"},"PeriodicalIF":2.6,"publicationDate":"2025-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144297363","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chonghao Liu , Yi Zhang , Sida Zheng , Jichang Guo
{"title":"Intrinsic image decomposition based joint image enhancement and instance segmentation network for low-light images","authors":"Chonghao Liu , Yi Zhang , Sida Zheng , Jichang Guo","doi":"10.1016/j.jvcir.2025.104498","DOIUrl":"10.1016/j.jvcir.2025.104498","url":null,"abstract":"<div><div>Due to low brightness and contrast, low-light conditions present significant challenges for both low-level and high-level vision tasks. For instance segmentation, low-light scenes often result in incomplete objects and inaccurate edges. Existing methods typically treat low-light enhancement as a preprocessing step, adopting an “enhance-then-segment” pipeline, which reduces segmentation accuracy and neglects the information generated during segmentation that is useful for low-light image enhancement (LLIE). To address these issues, we propose a novel strategy that couples LLIE with instance segmentation in a cross-complementary manner, allowing them to mutually improve each other. Specifically, we first replace traditional “enhance-then-segment” approach with a “decompose-then-segment” method by using the reflectance map generated during the enhancement process as input for instance segmentation. The details in the reflectance map can be preserved by improving decomposition loss functions, thus increasing the segmentation accuracy. Then we incorporate instance-level semantic information from the segmentation process with the proposed semantic feature fuse block (SFFB). It integrates semantic information into the feature representation space, guiding the enhancement process to perform differential enhancement on regions based on their semantic content. In addition, we propose an instance-guided color histogram (ICH) loss function to maintain color consistency between the enhanced image and the ground truth across instances. Extensive experiments on LIS dataset demonstrate the effectiveness and generality of our method.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104498"},"PeriodicalIF":2.6,"publicationDate":"2025-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144239476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Masked latent transformer with random masking ratio to advance the diagnosis of dental fluorosis","authors":"Hao Xu , Yun Wu , Junpeng Wu , Rui Xie , Maohua Gu , Rongpin Wang","doi":"10.1016/j.jvcir.2025.104496","DOIUrl":"10.1016/j.jvcir.2025.104496","url":null,"abstract":"<div><div>Dental fluorosis is a chronic condition caused by long-term overconsumption of fluoride, which leads to changes in the appearance of tooth enamel. Diagnosing its severity can be challenging for dental professionals, and limited research on deep learning applications in this field. Therefore, we propose a novel deep learning model, masked latent transformer with random masking ratio (MLTrMR), to advance the diagnosis of dental fluorosis. MLTrMR enhances contextual learning by using a masked latent modeling scheme based on Vision Transformer. It extracts latent tokens from the original image with a latent embedder, processes unmasked tokens with a latent transformer (LT) block, and predicts masked tokens. To improve model performance, we incorporate an auxiliary loss function. MLTrMR achieves state-of-the-art results, with 80.19% accuracy, 75.79% F1 score, and 81.28% quadratic weighted kappa on the first open-source dental fluorosis image dataset (DFID) we constructed. The dataset and code are available at <span><span>https://github.com/uxhao-o/MLTrMR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104496"},"PeriodicalIF":2.6,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144229482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingyi Ma , Xinzui Wang , Yan Wang , Tongda Xu , Fucheng Cao , Shichen Su
{"title":"PA-INR: Parallel adapter-based storage of edited implicit neural representation","authors":"Mingyi Ma , Xinzui Wang , Yan Wang , Tongda Xu , Fucheng Cao , Shichen Su","doi":"10.1016/j.jvcir.2025.104499","DOIUrl":"10.1016/j.jvcir.2025.104499","url":null,"abstract":"<div><div>INRs (Implicit Neural Representations) have demonstrated strong potential in the field of image representation. Unlike traditional discrete representation methods, INRs map pixel coordinates to RGB values using a neural network, where data can be stored in a continuous form. Due to the infeasibility of interpreting INRs during training, current methods cannot effectively store associated INR weights, such as network weights before and after image editing. This results in the same storage space required to store two associated INR weights as for storing two unrelated images. To address this issue, we propose a method based on Parallel Adapter, which efficiently stores multiple INRs through model fine-tuning. By storing the residuals of different INRs in parallel Adapter structures, the storage space required for multiple associated INRs is significantly reduced. Furthermore, by parallelizing and merging Adapter structures, our method achieves functionality similar to storing and merging editing histories. We conducted experiments on both images and videos, demonstrating that our method is fully compatible with existing weight training methods and methods for generating weights from hypernetworks. And with our method, it is possible to directly utilize a meta-network to generate residuals between INRs, allowing for a generalized direct editing while preserving the original INR structure.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104499"},"PeriodicalIF":2.6,"publicationDate":"2025-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144239475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shihui Zhang , Shaojie Han , Sheng Yang , Xueqiang Han , Junbin Su , Gangzheng Zhai , Houlin Wang
{"title":"Transferable targeted adversarial attack via multi-source perturbation generation and integration","authors":"Shihui Zhang , Shaojie Han , Sheng Yang , Xueqiang Han , Junbin Su , Gangzheng Zhai , Houlin Wang","doi":"10.1016/j.jvcir.2025.104493","DOIUrl":"10.1016/j.jvcir.2025.104493","url":null,"abstract":"<div><div>With the rapid development of artificial intelligence, deep learning models have been applied in the field of society (e.g., video or image representation). However, due to the presence of adversarial examples, these models exhibit obvious fragility, which has become a major challenge restricting society development. Therefore, studying the generation process and achieving high transferability of adversarial examples are of utmost importance. In this paper, we propose a transferable targeted adversarial attack method called Multi-source Perturbation Generation and Integration (MPGI) to address the vulnerability and uncertainty of deep learning models. Specifically, MPGI consists of three critical designs to achieve targeted transferability of adversarial examples. Firstly, we propose a Collaborative Feature Fusion (CFF) component, which reduces the impact of original example feature on model classification by considering collaboration in feature fusion. Subsequently, we propose a Multi-scale Perturbation Dynamic Fusion (MPDF) module to fuse perturbations from different scales for enriching perturbation diversity. Finally, we innovatively investigate a novel Logit Margin with Penalty (LMP) loss to further enhance the misleading ability of the examples. The LMP, as a pluggable part, offers the potential to be leveraged by different approaches for boosting performance. In summary, MPGI can effectively achieve targeted attacks, expose the shortcomings of existing models, and promote the development of artificial intelligence in terms of security. Extensive experiments on ImageNet-Compatible and CIFAR-10 datasets demonstrate the superiority of the proposed method. For instance, the attack success rate increases by 17.6% and 17.0% compared to state-of-the-art method when transferred from DN-121 to Inc-v3 and MB-v2 models.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104493"},"PeriodicalIF":2.6,"publicationDate":"2025-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144211996","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Menglong Yang , Hanyong Wang , Fangrui Wu , Xuebin Lv
{"title":"An end-to-end robust feature learning method for face recognition","authors":"Menglong Yang , Hanyong Wang , Fangrui Wu , Xuebin Lv","doi":"10.1016/j.jvcir.2025.104485","DOIUrl":"10.1016/j.jvcir.2025.104485","url":null,"abstract":"<div><div>As for deep learning-based face recognition, training a discriminative feature representation is challenging when dealing with noisy-labeled data. This paper introduces a feature learning method robust to such conditions. Our key contributions include an online data filtering algorithm that automatically segregates correctly labeled data from noisy-labeled training data. Additionally, we propose a mechanism called online negative centers sampling (ONCS), which can enlarge the feature space distance between samples within the same class and the centers of different classes. Thus feature learning can be contributed by all the data with ONCS, including the noise-labeled data. We test our method to training an 128-D feature representation on the extreme noisy MS-Celeb-1M dataset, without any preprocess procedures like pre-training dataset or cleaning dataset. The result demonstrates an accuracy of 99.33% on LFW test set with a single model and without the preprocessing of landmark-based alignment close to the result by the clean data.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104485"},"PeriodicalIF":2.6,"publicationDate":"2025-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144221018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinzhou Wang , Kai Sun , Xudong Zhang , Fuchun Sun , Ling Wang , Bin He
{"title":"Sync-4D: Monocular 4D reconstruction and generation with Synchronized Canonical Distillation","authors":"Xinzhou Wang , Kai Sun , Xudong Zhang , Fuchun Sun , Ling Wang , Bin He","doi":"10.1016/j.jvcir.2025.104483","DOIUrl":"10.1016/j.jvcir.2025.104483","url":null,"abstract":"<div><div>The development of video diffusion models and score distillation techniques has advanced dynamic 3D content generation. However, motion priors from video diffusion models have limited quality and temporal extent. Inspired by motion capture, we propose a text-to-4D framework that generates 4D content using skeletal animations extracted from monocular video. To enhance the 2D diffusion model for temporal-consistent 4D generation, we establish inter-frame token correspondences through canonical coordinate matching and fuse diffusion features. We further propose Synchronized Canonical Distillation (SCD) from a gradient-based perspective. In the score-matching process, SCD computes gradients over articulated models and denoises both the canonical model and motion field synchronously. By accumulating inter-frame and inter-view gradients, SCD mitigates multi-face artifacts and temporal inconsistencies, while diffusion priors further enhance consistency in unobserved regions. Experiments demonstrate that our method outperforms state-of-the-art monocular non-rigid reconstruction and 4D generation methods, achieving a 42.5% lower average Chamfer Distance.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"111 ","pages":"Article 104483"},"PeriodicalIF":2.6,"publicationDate":"2025-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144205281","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-task visual food recognition by integrating an ontology supported with LLM","authors":"Daniel Ponte , Eduardo Aguilar , Mireia Ribera , Petia Radeva","doi":"10.1016/j.jvcir.2025.104484","DOIUrl":"10.1016/j.jvcir.2025.104484","url":null,"abstract":"<div><div>Food image analysis is a crucial task with far-reaching implications across various domains, including culinary arts, nutrition, and food technology. This paper presents a novel approach to multi-task visual food analysis, using large language models to obtain recipes and support the creation of a comprehensive food ontology. The approach integrates the food ontology into an end-to-end model, with prior knowledge on the relationships of food concepts at different semantic levels, within a multi-task deep learning visual food analysis approach, to generate better and more consistent class predictions. Evaluated on two benchmark datasets, MAFood-121 and VireoFood-172, this method demonstrates its effectiveness in single-label food recognition and multi-label food group classification. The ontology enhances accuracy, consistency, and generalization by effectively transferring knowledge to the learning model. This study underscores the potential of ontology-based methods to address food image classification complexities, with implications for broad applications, including automated recipe generation and nutritional assessment.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104484"},"PeriodicalIF":2.6,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144178653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An unsupervised fine-tuning strategy for low-light image enhancement","authors":"Shaoping Xu , Qiyu Chen, Hanyang Hu, Liang Peng, Wuyong Tao","doi":"10.1016/j.jvcir.2025.104480","DOIUrl":"10.1016/j.jvcir.2025.104480","url":null,"abstract":"<div><div>The primary goal of low-light image enhancement (LLIE) algorithms is to improve the visibility of images taken in poor lighting conditions, thereby enhancing the performance of subsequent tasks. However, relying on a single LLIE algorithm often fails to consistently address aspects like color restoration, noise reduction, brightness adjustment, and detail preservation due to varying implementation strategies. To overcome this limitation, we propose an unsupervised fine-tuning strategy that integrates multiple LLIE methods for better and more comprehensive results. Our approach consists of two phases: in the preprocessing phase, we select two complementary LLIE algorithms, Retinexformer and RQ-LLIE, to process the input low-light image independently. The enhanced outputs are designated as preprocessed images. In the unsupervised fusion fine-tuning phase, a lightweight UNet network extracts features from these preprocessed images to produce a fused image, constrained by a hybrid loss function. This function ensures consistency in image content and adjusts quality based on color, spatial consistency, and exposure. We also employ an image quality screening mechanism to select the optimal final enhanced image from the iterative outputs. Extensive experiments on benchmark datasets confirm that our algorithm outperforms existing individual LLIE methods in both qualitative and quantitative evaluations. Moreover, our approach is highly extensible, allowing for the integration of future LLIE algorithms to achieve even better results.</div></div>","PeriodicalId":54755,"journal":{"name":"Journal of Visual Communication and Image Representation","volume":"110 ","pages":"Article 104480"},"PeriodicalIF":2.6,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}