{"title":"Cross-domain distribution adversarial diffusion model for synthesizing contrast-enhanced abdomen CT imaging","authors":"Qikui Zhu , Shaoming Zhu , Bo Du , Yanqing Wang","doi":"10.1016/j.patcog.2025.111695","DOIUrl":"10.1016/j.patcog.2025.111695","url":null,"abstract":"<div><div>Synthesizing contrast-enhanced CT imaging (CE-CT imaging) from non-contrast CT imaging (NC-CT) without the need for chemical contrast agents (CAs) injection holds significant clinical value, as CE-CT imaging plays a crucial role in diagnosing liver tumors, especially in identifying and distinguishing benign from malignant liver tumors. However, challenges within CT imaging, such as the low variability in intensity distribution and limited distribution changes, have hindered the effectiveness of existing synthetic methods, including GAN-based methods and diffusion model (DM)-based methods, in synthesizing CE-CT imaging. We propose a novel cross-domain distribution adversarial diffusion model (AdverDM) for CE-CT imaging synthesis, which overcomes the aforementioned challenges and facilitates the synthesis of CE-CT imaging. Our AdverDM incorporates three key innovations: (1) Cross-domain distribution adversarial learning is introduced into DM, enabling the utilization of cross-domain information to learn discriminative feature representations, addressing the limitations of existing DM based methods in capturing conceptually-aware discriminative features and extracting CA-aware feature representations. (2) A content-oriented diffusion model is creatively designed to guide tissue distribution learning, assisting DM in overcoming the challenge of low variability in intensity distribution. (3) A novel structure preservation loss is proposed to maintain the structural information, avoiding the problem of structural destruction faced by DMs. AdverDM is validated using corresponding two-modality CT images (pre-contrast and portal-venous phases), which is a clinically important procedure that benefits liver tumor biopsy. Experimental results (PSNR: 24.78, SSIM: 0.83, MAE: 6.94) demonstrate that our AdverDM successfully synthesizes CE-CT imaging without the need for chemical CAs injection. Moreover, AdverDM’s performance surpasses that of state-of-the-art synthetic methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111695"},"PeriodicalIF":7.5,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guoqing Zhang , Shichao Kan , Lu Shi , Wanru Xu , Gaoyun An , Yigang Cen
{"title":"Cross-scene visual context parsing with large vision-language model","authors":"Guoqing Zhang , Shichao Kan , Lu Shi , Wanru Xu , Gaoyun An , Yigang Cen","doi":"10.1016/j.patcog.2025.111641","DOIUrl":"10.1016/j.patcog.2025.111641","url":null,"abstract":"<div><div>Relation analysis is crucial for image-based applications such as visual reasoning and visual question answering. Current relation analysis such as scene graph generation (SGG) only focuses on building relationships among objects within a single image. However, in real-world applications, relationships among objects across multiple images, as seen in video understanding, may hold greater significance as they can capture global information. This is still a challenging and unexplored task. In this paper, we aim to explore the technique of Cross-Scene Visual Context Parsing (CS-VCP) using a large vision-language model. To achieve this, we first introduce a cross-scene dataset comprising 10,000 pairs of cross-scene visual instruction data, with each instruction describing the common knowledge of a pair of cross-scene images. We then propose a Cross-Scene Visual Symbiotic Linkage (CS-VSL) model to understand both cross-scene relationships and objects by analyzing the rationales in each scene. The model is pre-trained on 100,000 cross-scene image pairs and validated on 10,000 image pairs. Both quantitative and qualitative experiments demonstrate the effectiveness of the proposed method. Our method has been released on GitHub: <span><span>https://github.com/gavin-gqzhang/CS-VSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111641"},"PeriodicalIF":7.5,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrea Francesco Abate, Lucia Cascone, Michele Nappi
{"title":"Ricci curvature discretizations for head pose estimation from a single image","authors":"Andrea Francesco Abate, Lucia Cascone, Michele Nappi","doi":"10.1016/j.patcog.2025.111648","DOIUrl":"10.1016/j.patcog.2025.111648","url":null,"abstract":"<div><div>Head pose estimation (HPE) is crucial in various real-world applications, like human–computer interaction and biometric framework enhancement. This research aims to leverage network curvature to predict head pose from a single image. In networks, certain groups of nodes fulfill significant functional roles. This study focuses on the interactions of facial landmarks, considered as vertices in a weighted graph. The experiments demonstrate that the underlying graph geometry and topology enable the detection of similarities among various head poses. Two independent notions of discrete Ricci curvature for graphs, namely Ollivier–Ricci and Forman–Ricci curvatures, are investigated. These two types of Ricci curvature, each reflecting distinct geometric properties of the network, serve as inputs to the regression model. The results from the BIWI, AFLW2000, and Pointing‘04 datasets reveal that the two discretizations of Ricci’s curvature are closely related and outperform state-of-the-art methods, including both landmark-based and image-only approaches. This demonstrates the effectiveness and promise of using network curvature for HPE in diverse applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111648"},"PeriodicalIF":7.5,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Roberto Alcover-Couso , Marcos Escudero-Viñolo, Juan C. SanMiguel, Jesus Bescos
{"title":"Gradient-based class weighting for unsupervised domain adaptation in dense prediction visual tasks","authors":"Roberto Alcover-Couso , Marcos Escudero-Viñolo, Juan C. SanMiguel, Jesus Bescos","doi":"10.1016/j.patcog.2025.111633","DOIUrl":"10.1016/j.patcog.2025.111633","url":null,"abstract":"<div><div>In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite progress in bridging the domain gap, existing methods often experience performance degradation when confronted with highly imbalanced dense prediction visual tasks like semantic segmentation. This discrepancy becomes especially pronounced due to the lack of equivalent priors between the source and target domains, turning class imbalanced techniques used for other areas (e.g., image classification) ineffective in UDA scenarios. This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses, with the novelty of estimating these weights dynamically through the gradients of the per-class losses, defining a Gradient-based class weighting (GBW) approach. The proposed GBW naturally increases the contribution of classes whose learning is hindered by highly-represented classes, and has the advantage of automatically adapting to training outcomes, avoiding explicit curricular learning patterns common in loss-weighing strategies. Extensive experimentation validates the effectiveness of GBW across architectures (Convolutional and Transformer), UDA strategies (adversarial, self-training and entropy minimization), tasks (semantic and panoptic segmentation), and datasets. Analysis shows that GBW consistently increases the recall of under-represented classes.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111633"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prashant W. Patil , Santosh Nagnath Randive , Sunil Gupta , Santu Rana , Svetha Venkatesh , Subrahmanyam Murala
{"title":"Unpaired recurrent learning for real-world video de-hazing","authors":"Prashant W. Patil , Santosh Nagnath Randive , Sunil Gupta , Santu Rana , Svetha Venkatesh , Subrahmanyam Murala","doi":"10.1016/j.patcog.2025.111698","DOIUrl":"10.1016/j.patcog.2025.111698","url":null,"abstract":"<div><div>Automated outdoor vision-based applications have become increasingly in demand for day-to-day life. Bad weather like haze, rain, snow, <em>etc.</em> may limit the reliability of these applications due to degradation in the overall video quality. So, there is a dire need to pre-process the weather-degraded videos before they are fed to downstream applications. Researchers generally adopt synthetically generated paired hazy frames for learning the task of video de-hazing. The models trained solely on synthetic data may have limited performance on different types of real-world hazy scenarios due to significant domain gap between synthetic and real-world hazy videos. One possible solution is to prove the generalization ability by training on unpaired data for video de-hazing. Some unpaired learning approaches are proposed for single image de-hazing. However, these unpaired single image de-hazing approaches compromise the performance in terms of temporal consistency, which is important for video de-hazing tasks. With this motivation, we have proposed a lightweight and temporally consistent architecture for video de-hazing tasks. To achieve this, diverse receptive and multi-scale features at various input resolutions are mixed and aggregated with multi-kernel attention to extract significant haze information. Furthermore, we propose a recurrent multi-attentive feature alignment concept to maintain temporal consistency with recurrent feedback of previously restored frames for temporal consistent video restoration. Comprehensive experiments are conducted on real-world and synthetic video databases (REVIDE and RSA100Haze). Both the qualitative and quantitative results show significant improvement of the proposed network with better temporal consistency over state-of-the-art methods for detailed video restoration in hazy weather. Source code is available at: <span><span>https://github.com/pwp1208/UnpairedVideoDehazing</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111698"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conformal e-prediction","authors":"Vladimir Vovk","doi":"10.1016/j.patcog.2025.111674","DOIUrl":"10.1016/j.patcog.2025.111674","url":null,"abstract":"<div><div>This paper discusses a counterpart of conformal prediction for e-values, <em>conformal e-prediction</em>. Conformal e-prediction is conceptually simpler and had been developed in the 1990s as a precursor of conformal prediction. When conformal prediction emerged as result of replacing e-values by p-values, it seemed to have important advantages over conformal e-prediction without obvious disadvantages. This paper re-examines relations between conformal prediction and conformal e-prediction systematically from a modern perspective. Conformal e-prediction has advantages of its own, such as the ease of designing conditional conformal e-predictors and the guaranteed validity of cross-conformal e-predictors (whereas for cross-conformal predictors validity is only an empirical fact and can be broken with excessive randomization). Even where conformal prediction has clear advantages, conformal e-prediction can often emulate those advantages, more or less successfully.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111674"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huaikun Zhang , Pei Ma , Jizhao Liu , Jing Lian , Yide Ma
{"title":"Prototype-augmented mean teacher for robust semi-supervised medical image segmentation","authors":"Huaikun Zhang , Pei Ma , Jizhao Liu , Jing Lian , Yide Ma","doi":"10.1016/j.patcog.2025.111722","DOIUrl":"10.1016/j.patcog.2025.111722","url":null,"abstract":"<div><div>Semi-supervised learning has made significant progress in medical image segmentation, aiming to improve model performance with small amounts of labeled data and large amounts of unlabeled data. However, most existing methods focus too much on the supervision of label space and have insufficient supervision on feature space. Moreover, these methods generally focus on enhancing inter-class discrimination, ignoring the processing of intra-class variation, which has significant effects on fine-grained segmentation in complex medical images. To overcome these limitations, we propose a novel semi-supervised segmentation approach, Prototype-Augmented Mean Teacher (PAMT). Built upon the Mean Teacher framework, PAMT incorporates non-learnable prototypes to enhance feature space supervision. Specifically, we introduce two innovative loss functions: Prototype-Guided Pixel Classification (PGPC) Loss and Adaptive Prototype Contrastive (APC) Loss. PGPC Loss ensures pixel classification consistency with the nearest prototypes through a nearest-neighbor strategy, while APC Loss further captures intra-class variability, thereby improving the model's capacity to distinguish between pixels of the same class. By augmenting the Mean Teacher framework with prototype learning, PAMT not only improves feature representation and mitigates pseudo-label noise but also boosts segmentation accuracy and generalization, particularly in complex anatomical structures. Extensive experiments on three public datasets demonstrate that PAMT consistently surpasses state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111722"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jixiang Deng , Guohui Zhou , Yong Deng , Kang Hao Cheong
{"title":"BE-ECM: Belief Entropy-based Evidential C-Means and its application in data clustering","authors":"Jixiang Deng , Guohui Zhou , Yong Deng , Kang Hao Cheong","doi":"10.1016/j.patcog.2025.111676","DOIUrl":"10.1016/j.patcog.2025.111676","url":null,"abstract":"<div><div>As an extension of Fuzzy C-Means based on Dempster-Shafer evidence theory, Evidential C-Means (ECM) generalizes fuzzy partition to credal partition and has been widely applied. However, ECM’s objective function only considers distortion between objects and prototypes, making it highly sensitive to prototype initialization and prone to the local optima problem. While maximum entropy-based methods improve stability by entropy regularization, they are limited to fuzzy partition and cannot handle credal partition with multi-class uncertainty in evidential clustering. To overcome the issues, this paper proposes Belief Entropy-based Evidential C-Means (BE-ECM), which uniquely equips ECM with a belief entropy-based Maximum Entropy Principle (MEP) framework. Compared to ECM, BE-ECM considers not only the distortion term but also a negative belief entropy term, leveraging MEP to enhance stability against the local optimal problem. Unlike other maximum entropy-based methods, BE-ECM incorporates credal partition with belief entropy, enabling explicit multi-class uncertainty modeling and stable evidential clustering. During the clustering process of BE-ECM, the negative belief entropy term initially dominates to provide unbiased estimation for unknown data distributions, mitigating the impact of poorly initialized prototypes and reducing the risks of local optima, while the distortion term gradually refines the credal partition as clustering progresses. Experimental results demonstrate BE-ECM’s superior performance and high stability on clustering tasks compared with the existing clustering algorithms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111676"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Class and Domain Low-rank Tensor Learning for Multi-source Domain Adaptation","authors":"Yuwu Lu , Huiling Fu , Zhihui Lai , Xuelong Li","doi":"10.1016/j.patcog.2025.111675","DOIUrl":"10.1016/j.patcog.2025.111675","url":null,"abstract":"<div><div>Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from multiple labeled source domains to an unlabeled target domain. A key challenge in MUDA is to minimize the distributional discrepancy between the source and target domains. While traditional methods typically merge source domains to reduce this discrepancy, they often overlook higher-order correlations and class-discriminative relationships across domains, which weakens the generalization and classification abilities of the model. To address these challenges, we propose a novel method called Class and Domain Low-rank Tensor Learning (CDLTL), which integrates domain-level alignment and class-level alignment into a unified framework. Specifically, CDLTL leverages a projection matrix to map data from both source and target domains into a shared subspace, enabling the reconstruction of target domain samples from the source data and thereby reducing domain discrepancies. By combining tensor learning with joint sparse and weighted low-rank constraints, CDLTL achieves domain-level alignment, allowing the model to capture complex higher-order correlations across multiple domains while preserving global structures within the data. CDLTL also takes into account the geometric structure of multiple source domains and preserves local structures through manifold learning. Additionally, CDLTL achieves class-level alignment through class-based low-rank constraints, which improve intra-class compactness and inter-class separability, thus boosting the discriminative ability and robustness of the model. Extensive experiments conducted across various visual domain adaptation tasks demonstrate that the proposed method outperforms some of the existing approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111675"},"PeriodicalIF":7.5,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DecloudFormer: Quest the key to consistent thin cloud removal of wide-swath multi-spectral images","authors":"Mingkai Li , Qizhi Xu , Kaiqi Li , Wei Li","doi":"10.1016/j.patcog.2025.111664","DOIUrl":"10.1016/j.patcog.2025.111664","url":null,"abstract":"<div><div>Wide-swath images contain clouds of various shapes and thicknesses. Existing methods have different thin cloud removal strengths in different patches of the wide-swath image. This leads to severe cross-patch color inconsistency in the thin cloud removal results of wide-swath images. To solve this problem, a DecloudFormer with cross-patch thin cloud removal consistency was proposed. First, a Group Layer Normalization (GLNorm) was proposed to preserve both the spatial and channel distribution of thin cloud. Second, a CheckerBoard Mask (CB Mask) was proposed to make the network focus on different cloud-covered areas of the image and extract local cloud features. Finally, a two-branch DecloudFormer Block containing the CheckerBoard Attention (CBA) was proposed to fuse the global cloud features and local cloud features to reduce the cross-patch color difference. DecloudFormer and compared methods were tested for simulated thin cloud removal performance on images from QuickBird, GaoFen-2, and WorldView-2 satellites, and for real thin cloud removal performance on images from Landsat-8 satellite. The experiment results demonstrated that DecloudFormer outperformed the existing State-Of-The-Art (SOTA) methods. Furthermore, DecloudFormer makes it possible to process thin cloud covered wide-swath image using a small video memory GPU. The source code are available at <span><span>the link</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111664"},"PeriodicalIF":7.5,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}