Aihua Ke , Bo Cai , Yujie Huang , Jian Luo , Yaoxiang Yu , Le Li
{"title":"Dual-function discriminator for semantic image synthesis in variational GANs","authors":"Aihua Ke , Bo Cai , Yujie Huang , Jian Luo , Yaoxiang Yu , Le Li","doi":"10.1016/j.patcog.2025.111684","DOIUrl":"10.1016/j.patcog.2025.111684","url":null,"abstract":"<div><div>Semantic image synthesis aims to generate target images conditioned on given semantic labels, but existing methods often struggle with maintaining high visual quality and accurate semantic alignment. To address these challenges, we propose VD-GAN, a novel framework that integrates advanced architectural and functional innovations. Our variational generator, built on an enhanced U-Net architecture combining a pre-trained Swin transformer and CNN, captures both global and local semantic features, generating high-quality images. To further boost performance, we design two innovative modules: the Conditional Residual Attention Module (CRAM) for dimensionality reduction modulation and the Channel and Spatial Attention Mechanism (CSAM) for extracting key semantic relationships across channel and spatial dimensions. Additionally, we introduce a dual-function discriminator that not only distinguishes real and synthesized images, but also performs multi-class segmentation on synthesized images, guided by a redefined class-balanced cross-entropy loss to ensure semantic consistency. Extensive experiments show that VD-GAN outperforms the latest supervised methods, with improvements of (FID, mIoU, Acc) by (5.40%, 4.37%, 1.48%) and increases in auxiliary metrics (LPIPS, TOPIQ) by (2.45%, 23.52%). The code will be available at <span><span>https://github.com/ah-ke/VD-GAN.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111684"},"PeriodicalIF":7.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Improving imbalanced medical image classification through GAN-based data augmentation methods","authors":"Hongwei Ding , Nana Huang , Yaoxin Wu , Xiaohui Cui","doi":"10.1016/j.patcog.2025.111680","DOIUrl":"10.1016/j.patcog.2025.111680","url":null,"abstract":"<div><div>In the medical field, there exists a prevalent issue of data imbalance, severely impacting the performance of machine learning. Traditional data augmentation methods struggle to effectively generate augmented samples with strong diversity. Generative Adversarial Networks (GANs) can produce more effective new samples by learning the global distribution of samples. Although existing GAN models can balance inter-class distributions, the presence of sparse samples within classes can lead to intra-class mode collapse, rendering them unable to effectively fit the sparse region distribution. Based on this, our study proposes a two-step solution. Firstly, we employ a Cluster-Based Local Outlier Factor (CBLOF) algorithm to identify sparse and dense samples intra-class. Then, using these sparse and dense samples as conditions, we train the GAN model to better focus on fitting sparse samples intra-class. Finally, after training the GAN model, we propose using the One-Class SVM (OCS) algorithm as a noise filter to obtain pure augmented samples. We conducted extensive validation experiments on four medical datasets: BloodMNIST, OrganCMNIST, PathMNIST, and PneumoniaMNIST. The experimental results indicate that the method proposed in this study can generate samples with greater diversity and higher quality. Furthermore, by incorporating augmented samples, the accuracy improved by approximately 3% across four datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111680"},"PeriodicalIF":7.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143824357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiming Cheng , Mingxia Liu , Defu Yang , Zhidong Zhao , Chenggang Yan , Shuai Wang
{"title":"Domain generalization for image classification with dynamic decision boundary","authors":"Zhiming Cheng , Mingxia Liu , Defu Yang , Zhidong Zhao , Chenggang Yan , Shuai Wang","doi":"10.1016/j.patcog.2025.111678","DOIUrl":"10.1016/j.patcog.2025.111678","url":null,"abstract":"<div><div>Domain Generalization (DG) has been widely used in image classification tasks to effectively handle distribution shifts between source and target domains without accessing target domain data. Traditional DG methods typically rely on static models trained on the source domain for inference on unseen target domains, limiting their ability to fully leverage target domain characteristics. Test-Time Adaptation (TTA)-based DG methods improve generalization performance by adapting the model during inference using target domain samples. However, this often requires parameter fine-tuning on unseen target domains during inference, which may lead to forgetting of source domain knowledge or reduce real-time performance. To address this limitation, we propose a Dynamic Decision Boundary-based DG (DDB-DG) method for image classification, which effectively leverages target domain characteristics during inference without requiring additional training. In the proposed DDB-DG, we first introduce a Prototype-guide Multi-lever Prediction (PMP) module, which guides the dynamic adjustment of the decision boundary learned from the source domain by leveraging the correlation between test samples and prototypes. To enhance the accuracy of prototype computation, we also propose a data augmentation method called Uncertainty Style Mixture (USM), which expands the diversity of training samples to improve model generalization performance and enhance the accuracy of pseudo-labeling for target domain samples in prototypes. We validate DDB-DG using different backbone networks on three publicly available benchmark datasets: PACS, Office-Home, and VLCS. Experimental results demonstrate that our method achieves superior performance on both ResNet-18 and ResNet-50, surpassing the state-of-the-art DG and TTA methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111678"},"PeriodicalIF":7.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143824342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring dynamic plane representations for neural scene reconstruction","authors":"Ruihong Yin , Yunlu Chen , Sezer Karaoglu , Theo Gevers","doi":"10.1016/j.patcog.2025.111683","DOIUrl":"10.1016/j.patcog.2025.111683","url":null,"abstract":"<div><div>The efficient tri-plane representations present limited expressivity for encoding complex 3D scenes. To cope with the hampered spatial expressivity of tri-planes, this paper proposes a novel dynamic plane representation method for 3D scene reconstruction, including dynamic long-axis plane learning, a point-to-plane relationship module, and explicit coarse-to-fine feature projection. First, the proposed dynamic long-axis plane learning employs several planes along the principal axis and adapts planar positions dynamically, which can enhance geometry expressivity. Second, a point-to-plane relationship module is proposed to capture distinguished point features by learning the feature bias between plane features and point features. Third, the explicit coarse-to-fine feature projection employs a non-linear transformation to capture fine features from learnable coarse features, exploiting both local and global information with fewer increases in parameters. Experimental results on ScanNet and 7-Scenes demonstrate that our method achieves state-of-the-art performance with comparable computational costs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111683"},"PeriodicalIF":7.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143834061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tao Wang , Kaihao Zhang , Yong Zhang , Wenhan Luo , Björn Stenger , Tong Lu , Tae-Kyun Kim , Wei Liu
{"title":"LLDiffusion: Learning degradation representations in diffusion models for low-light image enhancement","authors":"Tao Wang , Kaihao Zhang , Yong Zhang , Wenhan Luo , Björn Stenger , Tong Lu , Tae-Kyun Kim , Wei Liu","doi":"10.1016/j.patcog.2025.111628","DOIUrl":"10.1016/j.patcog.2025.111628","url":null,"abstract":"<div><div>Current deep learning methods for low-light image enhancement typically rely on pixel-wise mappings using paired data, often overlooking the specific degradation factors inherent to low-light conditions, such as noise amplification, reduced contrast, and color distortion. This oversight can result in suboptimal performance. To address this limitation, we propose a degradation-aware learning framework that explicitly integrates degradation representations into the model design. We introduce LLDiffusion, a novel model composed of three key modules: a Degradation Generation Network (DGNET), a Dynamic Degradation-Aware Diffusion Module (DDDM), and a Latent Map Encoder (E). This approach enables joint learning of degradation representations, with the pre-trained Encoder (E) and DDDM effectively incorporating degradation and image priors into the diffusion process for improved enhancement. Extensive experiments on public benchmarks show that LLDiffusion outperforms state-of-the-art low-light image enhancement methods quantitatively and qualitatively. The source code and pre-trained models will be available at <span><span>https://github.com/TaoWangzj/LLDiffusion</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111628"},"PeriodicalIF":7.5,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fengyi Fang , Zihan Liao , Zhehan Kan , Guijin Wang , Wenming Yang
{"title":"MDSI: Pluggable Multi-strategy Decoupling with Semantic Integration for RGB-D Gesture Recognition","authors":"Fengyi Fang , Zihan Liao , Zhehan Kan , Guijin Wang , Wenming Yang","doi":"10.1016/j.patcog.2025.111653","DOIUrl":"10.1016/j.patcog.2025.111653","url":null,"abstract":"<div><div>Gestures encompass intricate visual representations, containing both task-relevant cues such as hand shapes and task-irrelevant elements like backgrounds and performer appearances. Despite progress in RGB-D-based gesture recognition, two primary challenges persist: (i) <em>Information Redundancy</em> (IR), which hinders the task-relevant feature extraction in the entangled space and misleads the recognition; (ii) <em>Information Absence</em> (IA), which exacerbates the difficulty of identifying visually similar instances. To alleviate these drawbacks, we propose a pluggable Multi-strategy Decoupling with Semantic Integration methodology, termed MDSI, for RGB-D gesture recognition. For IR, we introduce a Multi-strategy Decoupling Network (MDN) to precisely segregate pose-motion and spatial-temporal-channel features across modalities, thus effectively mitigating redundant information. For IA, we introduce the Semantic Integration Network (SIN), which integrates natural language modeling through semantic filtering and semantic label smoothing, markedly enhancing the model’s semantic understanding and knowledge integration. MDSI’s pluggable architecture allows for seamless integration into various RGB-D-based gesture recognition methods with minimal computational overhead. Experiments conducted on two public datasets demonstrate that our approach provides better feature representation and achieves better performance than state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111653"},"PeriodicalIF":7.5,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143855749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuanpeng He , Lijian Li , Tianxiang Zhan , Chi-Man Pun , Wenpin Jiao , Zhi Jin
{"title":"Co-evidential fusion with information volume for semi-supervised medical image segmentation","authors":"Yuanpeng He , Lijian Li , Tianxiang Zhan , Chi-Man Pun , Wenpin Jiao , Zhi Jin","doi":"10.1016/j.patcog.2025.111639","DOIUrl":"10.1016/j.patcog.2025.111639","url":null,"abstract":"<div><div>Although existing semi-supervised image segmentation methods have achieved good performance, they cannot effectively utilize multiple sources of voxel-level uncertainty for targeted learning. Therefore, we propose two main improvements. First, we introduce a novel pignistic co-evidential fusion strategy using generalized evidential deep learning, extended by traditional D–S evidence theory, to obtain a more precise uncertainty measure for each voxel in medical samples. This assists the model in learning mixed labeled information and establishing semantic associations between labeled and unlabeled data. Second, we introduce the concept of information volume of mass function (IVUM) to evaluate the constructed evidence, implementing two evidential learning schemes. One optimizes evidential deep learning by combining the information volume of the mass function with original uncertainty measures. The other integrates the learning pattern based on the co-evidential fusion strategy, using IVUM to design a new optimization objective. Experiments on four datasets demonstrate the competitive performance of our method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111639"},"PeriodicalIF":7.5,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143821295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohannad Al-Jaafari , Firas Abedi , You Yang , Qiong Liu
{"title":"Corner selection and dual network blender for efficient view synthesis in outdoor scenes","authors":"Mohannad Al-Jaafari , Firas Abedi , You Yang , Qiong Liu","doi":"10.1016/j.patcog.2025.111668","DOIUrl":"10.1016/j.patcog.2025.111668","url":null,"abstract":"<div><div>Novel view synthesis (NVS) from freely distributed viewpoints from large-scale outdoor scenes offers a compelling user experience in several applications, including first-person hyper-lapse videos and virtual reality. However, high-quality NVS requires dense input images, which can be affected by color discrepancies caused by extreme brightness changes, varying viewing angles, incorrect estimation of camera parameters, and mobile objects. This paper introduces Competent View Synthesis (CVS), a cost-effective approach to generating high-quality NVS from large-scale outdoor scenes to address these challenges. CVS employs a three-stage pipeline, including a Corners Selection Algorithm (CSA) to reduce the number of required input images, a Tinkering mechanism to fill in missing pixel data, and a dual-network blending (DNB) model to fuse colors and calculate attention coefficients for feature refinement. The experimental results demonstrate the effectiveness of the proposed CVS in generating realistic viewpoints from a limited number of input viewpoints. Furthermore, the comparative evaluations against two baselines using metrics such as PSNR, SSIM, and LPIPS reveal significant performance improvements of 4.6%, 0.18%, and 30.13%, respectively, over the first baseline, while the proposed method significantly outperforms the second baseline in term of perceptual metrics. By optimizing the synthesis process for complex outdoor scenes, CVS enhances the quality of generated images and improves computational efficiency.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111668"},"PeriodicalIF":7.5,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143851756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spectral Contrastive Clustering","authors":"Jerome Williams, Antonio Robles-Kelly","doi":"10.1016/j.patcog.2025.111671","DOIUrl":"10.1016/j.patcog.2025.111671","url":null,"abstract":"<div><div>We combine online spectral clustering and contrastive representation learning into a novel deep clustering algorithm that can be used for unsupervised image classification. We estimate a spectral embedding using minibatches. Spectral cluster assignments are used by a pairwise contrastive loss to update the model’s latent space, allowing our spectral embedding to adapt over time. We obtain competitive unsupervised classification performance purely by applying K-Means to our spectral embedding. Unlike competing methods, our approach does not require strong augmentations, class-balancing penalties, offline example mining or softmax classifiers.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111671"},"PeriodicalIF":7.5,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839395","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Skew-probabilistic neural networks for learning from imbalanced data","authors":"Shraddha M. Naik , Tanujit Chakraborty , Madhurima Panja , Abdenour Hadid , Bibhas Chakraborty","doi":"10.1016/j.patcog.2025.111677","DOIUrl":"10.1016/j.patcog.2025.111677","url":null,"abstract":"<div><div>Real-world datasets often exhibit imbalanced data distribution, where certain class levels are severely underrepresented. In such cases, traditional pattern classifiers have shown a bias towards the majority class, impeding accurate predictions for the minority class. This paper introduces an imbalanced data-oriented classifier using probabilistic neural networks (PNN) with a skew-normal kernel function to address this major challenge. PNN is known for providing probabilistic outputs, enabling quantification of prediction confidence, interpretability, and the ability to handle limited data. By leveraging the skew-normal distribution, which offers increased flexibility, particularly for imbalanced and non-symmetric data, our proposed Skew-Probabilistic Neural Networks (SkewPNN) can better represent underlying class densities. Hyperparameter fine-tuning is imperative to optimize the performance of the proposed approach on imbalanced datasets. To this end, we employ a population-based heuristic algorithm, the Bat optimization algorithm, to explore the hyperparameter space effectively. We also prove the statistical consistency of the density estimates, suggesting that the true distribution will be approached smoothly as the sample size increases. Theoretical analysis of the computational complexity of the proposed SkewPNN and BA-SkewPNN is also provided. Numerical simulations have been conducted on different synthetic datasets, comparing various benchmark-imbalanced learners. Real-data analysis on several datasets shows that SkewPNN and BA-SkewPNN substantially outperform most state-of-the-art machine-learning methods for both balanced and imbalanced datasets (binary and multi-class categories) in most experimental settings.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111677"},"PeriodicalIF":7.5,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143815696","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}