Pattern Recognition最新文献

筛选
英文 中文
Cross-domain distribution adversarial diffusion model for synthesizing contrast-enhanced abdomen CT imaging 合成腹部CT增强成像的跨域分布对抗扩散模型
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-19 DOI: 10.1016/j.patcog.2025.111695
Qikui Zhu , Shaoming Zhu , Bo Du , Yanqing Wang
{"title":"Cross-domain distribution adversarial diffusion model for synthesizing contrast-enhanced abdomen CT imaging","authors":"Qikui Zhu ,&nbsp;Shaoming Zhu ,&nbsp;Bo Du ,&nbsp;Yanqing Wang","doi":"10.1016/j.patcog.2025.111695","DOIUrl":"10.1016/j.patcog.2025.111695","url":null,"abstract":"<div><div>Synthesizing contrast-enhanced CT imaging (CE-CT imaging) from non-contrast CT imaging (NC-CT) without the need for chemical contrast agents (CAs) injection holds significant clinical value, as CE-CT imaging plays a crucial role in diagnosing liver tumors, especially in identifying and distinguishing benign from malignant liver tumors. However, challenges within CT imaging, such as the low variability in intensity distribution and limited distribution changes, have hindered the effectiveness of existing synthetic methods, including GAN-based methods and diffusion model (DM)-based methods, in synthesizing CE-CT imaging. We propose a novel cross-domain distribution adversarial diffusion model (AdverDM) for CE-CT imaging synthesis, which overcomes the aforementioned challenges and facilitates the synthesis of CE-CT imaging. Our AdverDM incorporates three key innovations: (1) Cross-domain distribution adversarial learning is introduced into DM, enabling the utilization of cross-domain information to learn discriminative feature representations, addressing the limitations of existing DM based methods in capturing conceptually-aware discriminative features and extracting CA-aware feature representations. (2) A content-oriented diffusion model is creatively designed to guide tissue distribution learning, assisting DM in overcoming the challenge of low variability in intensity distribution. (3) A novel structure preservation loss is proposed to maintain the structural information, avoiding the problem of structural destruction faced by DMs. AdverDM is validated using corresponding two-modality CT images (pre-contrast and portal-venous phases), which is a clinically important procedure that benefits liver tumor biopsy. Experimental results (PSNR: 24.78, SSIM: 0.83, MAE: 6.94) demonstrate that our AdverDM successfully synthesizes CE-CT imaging without the need for chemical CAs injection. Moreover, AdverDM’s performance surpasses that of state-of-the-art synthetic methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111695"},"PeriodicalIF":7.5,"publicationDate":"2025-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-scene visual context parsing with large vision-language model 基于大型视觉语言模型的跨场景视觉上下文解析
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-18 DOI: 10.1016/j.patcog.2025.111641
Guoqing Zhang , Shichao Kan , Lu Shi , Wanru Xu , Gaoyun An , Yigang Cen
{"title":"Cross-scene visual context parsing with large vision-language model","authors":"Guoqing Zhang ,&nbsp;Shichao Kan ,&nbsp;Lu Shi ,&nbsp;Wanru Xu ,&nbsp;Gaoyun An ,&nbsp;Yigang Cen","doi":"10.1016/j.patcog.2025.111641","DOIUrl":"10.1016/j.patcog.2025.111641","url":null,"abstract":"<div><div>Relation analysis is crucial for image-based applications such as visual reasoning and visual question answering. Current relation analysis such as scene graph generation (SGG) only focuses on building relationships among objects within a single image. However, in real-world applications, relationships among objects across multiple images, as seen in video understanding, may hold greater significance as they can capture global information. This is still a challenging and unexplored task. In this paper, we aim to explore the technique of Cross-Scene Visual Context Parsing (CS-VCP) using a large vision-language model. To achieve this, we first introduce a cross-scene dataset comprising 10,000 pairs of cross-scene visual instruction data, with each instruction describing the common knowledge of a pair of cross-scene images. We then propose a Cross-Scene Visual Symbiotic Linkage (CS-VSL) model to understand both cross-scene relationships and objects by analyzing the rationales in each scene. The model is pre-trained on 100,000 cross-scene image pairs and validated on 10,000 image pairs. Both quantitative and qualitative experiments demonstrate the effectiveness of the proposed method. Our method has been released on GitHub: <span><span>https://github.com/gavin-gqzhang/CS-VSL</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111641"},"PeriodicalIF":7.5,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868773","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ricci curvature discretizations for head pose estimation from a single image 从单幅图像估算头部姿态的里奇曲率离散法
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-18 DOI: 10.1016/j.patcog.2025.111648
Andrea Francesco Abate, Lucia Cascone, Michele Nappi
{"title":"Ricci curvature discretizations for head pose estimation from a single image","authors":"Andrea Francesco Abate,&nbsp;Lucia Cascone,&nbsp;Michele Nappi","doi":"10.1016/j.patcog.2025.111648","DOIUrl":"10.1016/j.patcog.2025.111648","url":null,"abstract":"<div><div>Head pose estimation (HPE) is crucial in various real-world applications, like human–computer interaction and biometric framework enhancement. This research aims to leverage network curvature to predict head pose from a single image. In networks, certain groups of nodes fulfill significant functional roles. This study focuses on the interactions of facial landmarks, considered as vertices in a weighted graph. The experiments demonstrate that the underlying graph geometry and topology enable the detection of similarities among various head poses. Two independent notions of discrete Ricci curvature for graphs, namely Ollivier–Ricci and Forman–Ricci curvatures, are investigated. These two types of Ricci curvature, each reflecting distinct geometric properties of the network, serve as inputs to the regression model. The results from the BIWI, AFLW2000, and Pointing‘04 datasets reveal that the two discretizations of Ricci’s curvature are closely related and outperform state-of-the-art methods, including both landmark-based and image-only approaches. This demonstrates the effectiveness and promise of using network curvature for HPE in diverse applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111648"},"PeriodicalIF":7.5,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gradient-based class weighting for unsupervised domain adaptation in dense prediction visual tasks 基于梯度的类加权在密集预测视觉任务中的无监督域自适应
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-17 DOI: 10.1016/j.patcog.2025.111633
Roberto Alcover-Couso , Marcos Escudero-Viñolo, Juan C. SanMiguel, Jesus Bescos
{"title":"Gradient-based class weighting for unsupervised domain adaptation in dense prediction visual tasks","authors":"Roberto Alcover-Couso ,&nbsp;Marcos Escudero-Viñolo,&nbsp;Juan C. SanMiguel,&nbsp;Jesus Bescos","doi":"10.1016/j.patcog.2025.111633","DOIUrl":"10.1016/j.patcog.2025.111633","url":null,"abstract":"<div><div>In unsupervised domain adaptation (UDA), where models are trained on source data (e.g., synthetic) and adapted to target data (e.g., real-world) without target annotations, addressing the challenge of significant class imbalance remains an open issue. Despite progress in bridging the domain gap, existing methods often experience performance degradation when confronted with highly imbalanced dense prediction visual tasks like semantic segmentation. This discrepancy becomes especially pronounced due to the lack of equivalent priors between the source and target domains, turning class imbalanced techniques used for other areas (e.g., image classification) ineffective in UDA scenarios. This paper proposes a class-imbalance mitigation strategy that incorporates class-weights into the UDA learning losses, with the novelty of estimating these weights dynamically through the gradients of the per-class losses, defining a Gradient-based class weighting (GBW) approach. The proposed GBW naturally increases the contribution of classes whose learning is hindered by highly-represented classes, and has the advantage of automatically adapting to training outcomes, avoiding explicit curricular learning patterns common in loss-weighing strategies. Extensive experimentation validates the effectiveness of GBW across architectures (Convolutional and Transformer), UDA strategies (adversarial, self-training and entropy minimization), tasks (semantic and panoptic segmentation), and datasets. Analysis shows that GBW consistently increases the recall of under-represented classes.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111633"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839392","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Unpaired recurrent learning for real-world video de-hazing 真实世界视频去雾化的非配对循环学习
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-17 DOI: 10.1016/j.patcog.2025.111698
Prashant W. Patil , Santosh Nagnath Randive , Sunil Gupta , Santu Rana , Svetha Venkatesh , Subrahmanyam Murala
{"title":"Unpaired recurrent learning for real-world video de-hazing","authors":"Prashant W. Patil ,&nbsp;Santosh Nagnath Randive ,&nbsp;Sunil Gupta ,&nbsp;Santu Rana ,&nbsp;Svetha Venkatesh ,&nbsp;Subrahmanyam Murala","doi":"10.1016/j.patcog.2025.111698","DOIUrl":"10.1016/j.patcog.2025.111698","url":null,"abstract":"<div><div>Automated outdoor vision-based applications have become increasingly in demand for day-to-day life. Bad weather like haze, rain, snow, <em>etc.</em> may limit the reliability of these applications due to degradation in the overall video quality. So, there is a dire need to pre-process the weather-degraded videos before they are fed to downstream applications. Researchers generally adopt synthetically generated paired hazy frames for learning the task of video de-hazing. The models trained solely on synthetic data may have limited performance on different types of real-world hazy scenarios due to significant domain gap between synthetic and real-world hazy videos. One possible solution is to prove the generalization ability by training on unpaired data for video de-hazing. Some unpaired learning approaches are proposed for single image de-hazing. However, these unpaired single image de-hazing approaches compromise the performance in terms of temporal consistency, which is important for video de-hazing tasks. With this motivation, we have proposed a lightweight and temporally consistent architecture for video de-hazing tasks. To achieve this, diverse receptive and multi-scale features at various input resolutions are mixed and aggregated with multi-kernel attention to extract significant haze information. Furthermore, we propose a recurrent multi-attentive feature alignment concept to maintain temporal consistency with recurrent feedback of previously restored frames for temporal consistent video restoration. Comprehensive experiments are conducted on real-world and synthetic video databases (REVIDE and RSA100Haze). Both the qualitative and quantitative results show significant improvement of the proposed network with better temporal consistency over state-of-the-art methods for detailed video restoration in hazy weather. Source code is available at: <span><span>https://github.com/pwp1208/UnpairedVideoDehazing</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111698"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Conformal e-prediction 保形e-prediction
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-17 DOI: 10.1016/j.patcog.2025.111674
Vladimir Vovk
{"title":"Conformal e-prediction","authors":"Vladimir Vovk","doi":"10.1016/j.patcog.2025.111674","DOIUrl":"10.1016/j.patcog.2025.111674","url":null,"abstract":"<div><div>This paper discusses a counterpart of conformal prediction for e-values, <em>conformal e-prediction</em>. Conformal e-prediction is conceptually simpler and had been developed in the 1990s as a precursor of conformal prediction. When conformal prediction emerged as result of replacing e-values by p-values, it seemed to have important advantages over conformal e-prediction without obvious disadvantages. This paper re-examines relations between conformal prediction and conformal e-prediction systematically from a modern perspective. Conformal e-prediction has advantages of its own, such as the ease of designing conditional conformal e-predictors and the guaranteed validity of cross-conformal e-predictors (whereas for cross-conformal predictors validity is only an empirical fact and can be broken with excessive randomization). Even where conformal prediction has clear advantages, conformal e-prediction can often emulate those advantages, more or less successfully.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111674"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143843012","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prototype-augmented mean teacher for robust semi-supervised medical image segmentation 鲁棒半监督医学图像分割的原型增强均值教师
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-17 DOI: 10.1016/j.patcog.2025.111722
Huaikun Zhang , Pei Ma , Jizhao Liu , Jing Lian , Yide Ma
{"title":"Prototype-augmented mean teacher for robust semi-supervised medical image segmentation","authors":"Huaikun Zhang ,&nbsp;Pei Ma ,&nbsp;Jizhao Liu ,&nbsp;Jing Lian ,&nbsp;Yide Ma","doi":"10.1016/j.patcog.2025.111722","DOIUrl":"10.1016/j.patcog.2025.111722","url":null,"abstract":"<div><div>Semi-supervised learning has made significant progress in medical image segmentation, aiming to improve model performance with small amounts of labeled data and large amounts of unlabeled data. However, most existing methods focus too much on the supervision of label space and have insufficient supervision on feature space. Moreover, these methods generally focus on enhancing inter-class discrimination, ignoring the processing of intra-class variation, which has significant effects on fine-grained segmentation in complex medical images. To overcome these limitations, we propose a novel semi-supervised segmentation approach, Prototype-Augmented Mean Teacher (PAMT). Built upon the Mean Teacher framework, PAMT incorporates non-learnable prototypes to enhance feature space supervision. Specifically, we introduce two innovative loss functions: Prototype-Guided Pixel Classification (PGPC) Loss and Adaptive Prototype Contrastive (APC) Loss. PGPC Loss ensures pixel classification consistency with the nearest prototypes through a nearest-neighbor strategy, while APC Loss further captures intra-class variability, thereby improving the model's capacity to distinguish between pixels of the same class. By augmenting the Mean Teacher framework with prototype learning, PAMT not only improves feature representation and mitigates pseudo-label noise but also boosts segmentation accuracy and generalization, particularly in complex anatomical structures. Extensive experiments on three public datasets demonstrate that PAMT consistently surpasses state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111722"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143863638","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BE-ECM: Belief Entropy-based Evidential C-Means and its application in data clustering 基于信念熵的证据c均值及其在数据聚类中的应用
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-17 DOI: 10.1016/j.patcog.2025.111676
Jixiang Deng , Guohui Zhou , Yong Deng , Kang Hao Cheong
{"title":"BE-ECM: Belief Entropy-based Evidential C-Means and its application in data clustering","authors":"Jixiang Deng ,&nbsp;Guohui Zhou ,&nbsp;Yong Deng ,&nbsp;Kang Hao Cheong","doi":"10.1016/j.patcog.2025.111676","DOIUrl":"10.1016/j.patcog.2025.111676","url":null,"abstract":"<div><div>As an extension of Fuzzy C-Means based on Dempster-Shafer evidence theory, Evidential C-Means (ECM) generalizes fuzzy partition to credal partition and has been widely applied. However, ECM’s objective function only considers distortion between objects and prototypes, making it highly sensitive to prototype initialization and prone to the local optima problem. While maximum entropy-based methods improve stability by entropy regularization, they are limited to fuzzy partition and cannot handle credal partition with multi-class uncertainty in evidential clustering. To overcome the issues, this paper proposes Belief Entropy-based Evidential C-Means (BE-ECM), which uniquely equips ECM with a belief entropy-based Maximum Entropy Principle (MEP) framework. Compared to ECM, BE-ECM considers not only the distortion term but also a negative belief entropy term, leveraging MEP to enhance stability against the local optimal problem. Unlike other maximum entropy-based methods, BE-ECM incorporates credal partition with belief entropy, enabling explicit multi-class uncertainty modeling and stable evidential clustering. During the clustering process of BE-ECM, the negative belief entropy term initially dominates to provide unbiased estimation for unknown data distributions, mitigating the impact of poorly initialized prototypes and reducing the risks of local optima, while the distortion term gradually refines the credal partition as clustering progresses. Experimental results demonstrate BE-ECM’s superior performance and high stability on clustering tasks compared with the existing clustering algorithms.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111676"},"PeriodicalIF":7.5,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143876928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Class and Domain Low-rank Tensor Learning for Multi-source Domain Adaptation 多源域自适应的类和域低秩张量学习
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-16 DOI: 10.1016/j.patcog.2025.111675
Yuwu Lu , Huiling Fu , Zhihui Lai , Xuelong Li
{"title":"Class and Domain Low-rank Tensor Learning for Multi-source Domain Adaptation","authors":"Yuwu Lu ,&nbsp;Huiling Fu ,&nbsp;Zhihui Lai ,&nbsp;Xuelong Li","doi":"10.1016/j.patcog.2025.111675","DOIUrl":"10.1016/j.patcog.2025.111675","url":null,"abstract":"<div><div>Multi-source unsupervised domain adaptation (MUDA) aims to transfer knowledge from multiple labeled source domains to an unlabeled target domain. A key challenge in MUDA is to minimize the distributional discrepancy between the source and target domains. While traditional methods typically merge source domains to reduce this discrepancy, they often overlook higher-order correlations and class-discriminative relationships across domains, which weakens the generalization and classification abilities of the model. To address these challenges, we propose a novel method called Class and Domain Low-rank Tensor Learning (CDLTL), which integrates domain-level alignment and class-level alignment into a unified framework. Specifically, CDLTL leverages a projection matrix to map data from both source and target domains into a shared subspace, enabling the reconstruction of target domain samples from the source data and thereby reducing domain discrepancies. By combining tensor learning with joint sparse and weighted low-rank constraints, CDLTL achieves domain-level alignment, allowing the model to capture complex higher-order correlations across multiple domains while preserving global structures within the data. CDLTL also takes into account the geometric structure of multiple source domains and preserves local structures through manifold learning. Additionally, CDLTL achieves class-level alignment through class-based low-rank constraints, which improve intra-class compactness and inter-class separability, thus boosting the discriminative ability and robustness of the model. Extensive experiments conducted across various visual domain adaptation tasks demonstrate that the proposed method outperforms some of the existing approaches.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111675"},"PeriodicalIF":7.5,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DecloudFormer: Quest the key to consistent thin cloud removal of wide-swath multi-spectral images DecloudFormer:寻找宽幅多光谱图像中一致的薄云去除的关键
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-15 DOI: 10.1016/j.patcog.2025.111664
Mingkai Li , Qizhi Xu , Kaiqi Li , Wei Li
{"title":"DecloudFormer: Quest the key to consistent thin cloud removal of wide-swath multi-spectral images","authors":"Mingkai Li ,&nbsp;Qizhi Xu ,&nbsp;Kaiqi Li ,&nbsp;Wei Li","doi":"10.1016/j.patcog.2025.111664","DOIUrl":"10.1016/j.patcog.2025.111664","url":null,"abstract":"<div><div>Wide-swath images contain clouds of various shapes and thicknesses. Existing methods have different thin cloud removal strengths in different patches of the wide-swath image. This leads to severe cross-patch color inconsistency in the thin cloud removal results of wide-swath images. To solve this problem, a DecloudFormer with cross-patch thin cloud removal consistency was proposed. First, a Group Layer Normalization (GLNorm) was proposed to preserve both the spatial and channel distribution of thin cloud. Second, a CheckerBoard Mask (CB Mask) was proposed to make the network focus on different cloud-covered areas of the image and extract local cloud features. Finally, a two-branch DecloudFormer Block containing the CheckerBoard Attention (CBA) was proposed to fuse the global cloud features and local cloud features to reduce the cross-patch color difference. DecloudFormer and compared methods were tested for simulated thin cloud removal performance on images from QuickBird, GaoFen-2, and WorldView-2 satellites, and for real thin cloud removal performance on images from Landsat-8 satellite. The experiment results demonstrated that DecloudFormer outperformed the existing State-Of-The-Art (SOTA) methods. Furthermore, DecloudFormer makes it possible to process thin cloud covered wide-swath image using a small video memory GPU. The source code are available at <span><span>the link</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111664"},"PeriodicalIF":7.5,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143859307","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信