Pattern Recognition最新文献

筛选
英文 中文
DSDC-NET: Semi-supervised superficial OCTA vessel segmentation for false positive reduction
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-20 DOI: 10.1016/j.patcog.2025.111592
Xinyi Liu, Hailan Shen, Wenyan Zhong, Wanqing Xiong, Zailiang Chen
{"title":"DSDC-NET: Semi-supervised superficial OCTA vessel segmentation for false positive reduction","authors":"Xinyi Liu,&nbsp;Hailan Shen,&nbsp;Wenyan Zhong,&nbsp;Wanqing Xiong,&nbsp;Zailiang Chen","doi":"10.1016/j.patcog.2025.111592","DOIUrl":"10.1016/j.patcog.2025.111592","url":null,"abstract":"<div><div>Accurate vessel segmentation in Optical Coherence Tomography Angiography (OCTA) is essential for ocular disease diagnosis, monitoring, and treatment assessment. However, most current automatic segmentation methods overlook false positives in the segmentation results, leading to potential misdiagnosis and delayed treatment. To address this issue, we propose a Dynamic Spatial Semi-Supervised Vessel Segmentation with Dual Topological Consistency (DSDC-NET) for retinal superficial OCTA images. The network integrates a Dynamic Spatial Attention Mechanism that combines snake-shaped convolution, which captures tubular fine structures, with spatial attention to suppress background noise and artefacts. This design enhances vessel region responses while accurately capturing complex local structures, thereby reducing false positives arising from inaccurate localisation of vessel details. Furthermore, Dual Topological Consistency Loss integrates the Persistent Homology features of the vessel system with the topological skeleton features of major vessels, enhancing branching pattern recognition. A Warm-up mechanism balances the focus of the network between major and branch vessels across training phases, mitigating false positives from inadequate branching structure learning. Comprehensive evaluations on ROSE-1, OCTA-500, and ROSSA datasets demonstrate the superiority of DSDC-NET over existing methods. Notably, DSDC-NET effectively reduces the false discovery rate and improves segmentation accuracy, validating its effectiveness in reducing false positives.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111592"},"PeriodicalIF":7.5,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680922","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Disentanglement and codebook learning-induced feature match network to diagnose neurodegenerative diseases on incomplete multimodal data
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-20 DOI: 10.1016/j.patcog.2025.111597
Wei Xiong , Tao Wang , Xiumei Chen , Yue Zhang , Wencong Zhang , Qianjin Feng , Meiyan Huang , Alzheimer’s Disease Neuroimaging Initiative
{"title":"Disentanglement and codebook learning-induced feature match network to diagnose neurodegenerative diseases on incomplete multimodal data","authors":"Wei Xiong ,&nbsp;Tao Wang ,&nbsp;Xiumei Chen ,&nbsp;Yue Zhang ,&nbsp;Wencong Zhang ,&nbsp;Qianjin Feng ,&nbsp;Meiyan Huang ,&nbsp;Alzheimer’s Disease Neuroimaging Initiative","doi":"10.1016/j.patcog.2025.111597","DOIUrl":"10.1016/j.patcog.2025.111597","url":null,"abstract":"<div><div>Multimodal data can provide complementary information to diagnose neurodegenerative diseases (NDs). However, image quality variations and high costs can result in the missing data problem. Although incomplete multimodal data can be projected onto a common space, the traditional projection process may increase alignment errors and lose some modality-specific information. A disentanglement and codebook learning-induced feature match network (DCFMnet) is proposed in this study to solve the aforementioned issues. First, multimodal data are disentangled into latent modality-common and -specific features to help preserve modality-specific information in the subsequent alignment of multimodal data. Second, the latent modal features of all available data are aligned into a common space to reduce alignment errors and fused to achieve ND diagnosis. Moreover, the latent modal features of the modality with missing data are explored in online updated feature codebooks. Last, DCFMnet is tested on two publicly available datasets to illustrate its excellent performance in ND diagnosis.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111597"},"PeriodicalIF":7.5,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MSKA: Multi-stream keypoint attention network for sign language recognition and translation
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-19 DOI: 10.1016/j.patcog.2025.111602
Mo Guan , Yan Wang , Guangkun Ma , Jiarui Liu , Mingzu Sun
{"title":"MSKA: Multi-stream keypoint attention network for sign language recognition and translation","authors":"Mo Guan ,&nbsp;Yan Wang ,&nbsp;Guangkun Ma ,&nbsp;Jiarui Liu ,&nbsp;Mingzu Sun","doi":"10.1016/j.patcog.2025.111602","DOIUrl":"10.1016/j.patcog.2025.111602","url":null,"abstract":"<div><div>Sign language serves as a non-vocal means of communication, transmitting information and significance through gestures, facial expressions, and bodily movements. The majority of current approaches for sign language recognition (SLR) and translation rely on RGB video inputs, which are vulnerable to fluctuations in the background. Employing a keypoint-based strategy not only mitigates the effects of background alterations but also substantially diminishes the computational demands of the model. Nevertheless, contemporary keypoint-based methodologies fail to fully harness the implicit knowledge embedded in keypoint sequences. To tackle this challenge, our inspiration is derived from the human cognition mechanism, which discerns sign language by analyzing the interplay between gesture configurations and supplementary elements. We propose a multi-stream keypoint attention network to depict a sequence of keypoints produced by a readily available keypoint estimator. In order to facilitate interaction across multiple streams, we investigate diverse methodologies such as keypoint fusion strategies, head fusion, and self-distillation. The resulting framework is denoted as MSKA-SLR, which is expanded into a sign language translation (SLT) model through the straightforward addition of an extra translation network. We carry out comprehensive experiments on well-known benchmarks like Phoenix-2014, Phoenix-2014T, and CSL-Daily to showcase the efficacy of our methodology. Notably, we have attained a novel state-of-the-art performance in the sign language translation task of Phoenix-2014T. The code and models can be accessed at: <span><span>https://github.com/sutwangyan/MSKA</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111602"},"PeriodicalIF":7.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680921","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integrated subset selection and bandwidth estimation algorithm for geographically weighted regression
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-19 DOI: 10.1016/j.patcog.2025.111589
Hyunwoo Lee , Young Woong Park
{"title":"Integrated subset selection and bandwidth estimation algorithm for geographically weighted regression","authors":"Hyunwoo Lee ,&nbsp;Young Woong Park","doi":"10.1016/j.patcog.2025.111589","DOIUrl":"10.1016/j.patcog.2025.111589","url":null,"abstract":"<div><div>This study proposes a mathematical programming-based algorithm for the integrated selection of variable subsets and bandwidth estimation in geographically weighted regression, a local regression method that allows the kernel bandwidth and regression coefficients to vary across study areas. Unlike standard approaches in the literature, in which bandwidth and regression parameters are estimated separately for each focal point on the basis of different criteria, our model uses a single objective function for the integrated estimation of regression and bandwidth parameters across all focal points, based on the regression likelihood function and variance modeling. The proposed model further integrates a procedure to select a single subset of independent variables for all focal points, whereas existing approaches may return heterogeneous subsets across focal points. We then propose an alternative direction method to solve the nonconvex mathematical model and show that it converges to a partial minimum. The computational experiment indicates that the proposed algorithm provides competitive explanatory power with stable spatially varying patterns, with the ability to select the best subset and account for additional constraints.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111589"},"PeriodicalIF":7.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-modal hypergraph contrastive learning for medical image segmentation
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-19 DOI: 10.1016/j.patcog.2025.111544
Weipeng Jing , Junze Wang , Donglin Di , Dandan Li , Yang Song , Lei Fan
{"title":"Multi-modal hypergraph contrastive learning for medical image segmentation","authors":"Weipeng Jing ,&nbsp;Junze Wang ,&nbsp;Donglin Di ,&nbsp;Dandan Li ,&nbsp;Yang Song ,&nbsp;Lei Fan","doi":"10.1016/j.patcog.2025.111544","DOIUrl":"10.1016/j.patcog.2025.111544","url":null,"abstract":"<div><div>Self-supervised learning (SSL) has become a dominant approach in multi-modal medical image segmentation. However, existing methods, such as Seq SSL and Joint SSL, suffer from catastrophic forgetting and conflicts in representation learning across different modalities. To address these challenges, we propose a two-stage SSL framework, HyCon, for multi-modal medical image segmentation. It combines the advantages of Seq and Joint SSL using knowledge distillation to align similar topological samples across modalities. In the first stage, cross-modal features are learned through adversarial learning. Inspired by the Graph Foundation Models and further adapted to our task, the Hypergraph Contrastive Learning Network (HCLN) with a teacher-student architecture is subsequently introduced to capture high-order relationships across modalities by integrating hypergraphs with contrastive learning. The Topology Hybrid Distillation (THD) module distills topological information, contextual features, and relational knowledge into the student model. We evaluated HyCon on two organs, lung and brain. Our framework outperformed state-of-the-art SSL methods, achieving significant improvements in segmentation with limited labeled data. Both quantitative and qualitative experiments validate the effectiveness of the design of our framework. Code is available at: <span><span>https://github.com/reeive/HyCon</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111544"},"PeriodicalIF":7.5,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680958","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning position-aware implicit neural network for real-world face inpainting
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-18 DOI: 10.1016/j.patcog.2025.111598
Bo Zhao, Huan Yang, Jianlong Fu
{"title":"Learning position-aware implicit neural network for real-world face inpainting","authors":"Bo Zhao,&nbsp;Huan Yang,&nbsp;Jianlong Fu","doi":"10.1016/j.patcog.2025.111598","DOIUrl":"10.1016/j.patcog.2025.111598","url":null,"abstract":"<div><div>Face inpainting requires the model to have a precise global understanding of the facial position structure. Benefiting from the powerful capabilities of deep learning backbones, recent works in face inpainting have achieved decent performance in ideal setting (square shape with 512px). However, existing methods often produce a visually unpleasant result, especially in the position-sensitive details (e.g., eyes and nose), when directly applied to arbitrary-shaped images in real-world scenarios. The visually unpleasant position-sensitive details indicate the shortcomings of existing methods in terms of position information processing capability. In this paper, we propose an <strong>I</strong>mplicit <strong>N</strong>eural <strong>I</strong>npainting <strong>N</strong>etwork (IN<span><math><msup><mrow></mrow><mrow><mn>2</mn></mrow></msup></math></span>) to handle arbitrary-shape face images in real-world scenarios by explicit modeling for position information. Specifically, a downsample processing encoder is proposed to reduce information loss while obtaining the global semantic feature. A neighbor hybrid attention block is proposed with a hybrid attention mechanism to improve the model’s facial understanding ability without restricting the input’s shape. Finally, an implicit neural pyramid decoder is introduced to explicitly model position information and bridge the gap between low-resolution features and high-resolution output. Our method achieves optimal facial image restoration performance on both the CelebA-HQ and LFW datasets, as well as downstream tasks of face verification, which introduces more efficient face inpainting algorithm to the fields of image editing software and intelligent security.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111598"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143759848","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Generalizable person re-identification method using bi-stream interactive learning with feature reconstruction
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-18 DOI: 10.1016/j.patcog.2025.111591
Feng Min, Yuhui Liu, Yixin Mao
{"title":"Generalizable person re-identification method using bi-stream interactive learning with feature reconstruction","authors":"Feng Min,&nbsp;Yuhui Liu,&nbsp;Yixin Mao","doi":"10.1016/j.patcog.2025.111591","DOIUrl":"10.1016/j.patcog.2025.111591","url":null,"abstract":"<div><div>Recent studies have shown that metric learning and representation learning are two main methods to improve the generalization ability of pedestrian re-identification models. However, their relationship has not been fully explored. Unlike GANs’ emphasis on adversarial learning, our objective is to develop an interactive and synergistic learning framework for them. To achieve this, we propose a generalized pedestrian re-identification method using bi-stream interactive learning. One of the learning streams is the correlation graph sampler (CGS) for metric learning, and the other learning stream is the global sparse attention network (GSANet) for representation learning. We establish an intrinsic connection between these two learning streams. Unlike many existing methods that have high memory and computation costs or lack learning ability, CGS provides a more efficient and effective solution. CGS uses local sensitive hashing and feature metrics to construct the nearest neighbor graph for all categories at the beginning of training, which ensures that each batch of training samples contains randomly selected base categories and their nearest neighbor categories, providing strong similarity and challenging learning examples. As CGS sampling performance is affected by the quality of the feature map, we propose a global feature sparse reconstruction module to enhance the global self-correlation of the feature map extracted by the backbone network. Additionally, we extensively evaluate our method on large-scale datasets, including CUHK03, Market-1501, and MSMT17, and our method outperforms current state-of-the-art methods. These results confirm the effectiveness of our method and demonstrate its potential in pedestrian re-identification applications.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111591"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143697611","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-world nighttime image dehazing using contrastive and adversarial learning
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-18 DOI: 10.1016/j.patcog.2025.111596
Jingwen Deng , Patrick P.K. Chan , Daniel S. Yeung
{"title":"Real-world nighttime image dehazing using contrastive and adversarial learning","authors":"Jingwen Deng ,&nbsp;Patrick P.K. Chan ,&nbsp;Daniel S. Yeung","doi":"10.1016/j.patcog.2025.111596","DOIUrl":"10.1016/j.patcog.2025.111596","url":null,"abstract":"<div><div>Nighttime image dehazing is a challenging task due to the scarcity of real hazy images and the domain gap between synthetic and real data. To address these challenges, we propose a novel deep learning framework that integrates contrastive and adversarial learning. In the initial training phase, the dehazing generator is trained on synthetic data to produce dehazed images that closely match the ground truths while maintaining a significant distance from the original hazy images through contrastive learning. Simultaneously, the contrastive learning encoder is updated to enhance its ability to distinguish between the dehazed images and ground truths, thereby increasing the difficulty of the dehazing task and pushing the generator to fully exploit feature information for improved results. To bridge the gap between synthetic and real data, the model is fine-tuned using a small set of real hazy images. To mitigate bias from the limited amount of real data, an additional constraint is applied to regulate model adjustments during fine-tuning. Empirical evaluation on multiple benchmark datasets demonstrates that our model outperforms state-of-the-art methods, providing an effective solution for improving visibility in hazy nighttime images by effectively leveraging both synthetic and real data.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111596"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3D microvascular reconstruction in retinal OCT angiography images via domain-adaptive learning
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-18 DOI: 10.1016/j.patcog.2025.111494
Jiong Zhang , Shuai Yu , Yonghuai Liu , Dan Zhang , Jianyang Xie , Tao Chen , Yalin Zheng , Huazhu Fu , Yitian Zhao
{"title":"3D microvascular reconstruction in retinal OCT angiography images via domain-adaptive learning","authors":"Jiong Zhang ,&nbsp;Shuai Yu ,&nbsp;Yonghuai Liu ,&nbsp;Dan Zhang ,&nbsp;Jianyang Xie ,&nbsp;Tao Chen ,&nbsp;Yalin Zheng ,&nbsp;Huazhu Fu ,&nbsp;Yitian Zhao","doi":"10.1016/j.patcog.2025.111494","DOIUrl":"10.1016/j.patcog.2025.111494","url":null,"abstract":"<div><div>Optical Coherence Tomography Angiography (OCTA) is a non-invasive imaging technique that enables the acquisition of 3D depth-resolved information with micrometer resolution, facilitating the diagnosis of various eye-related diseases. In OCTA-based image analysis, 2D <em>en face</em> projected images are commonly used for quantifying microvascular changes, while the 3D images with rich depth information remains largely unexplored. This is mainly due to that direct 3D vessel reconstruction faces several challenges, including projection artifacts, complex vessel topology, and high computational cost. These limitations hinder comprehensive microvascular analysis and may obscure potentially vital 3D vessel biomarkers. In this study, we propose a novel method for 3D reconstruction of retinal microvasculature using 2D <em>en face</em> images. Our approach capitalizes on a elaborately generated 2D OCTA depth map for vessel reconstruction, thus eliminating the need for unavailable 3D volumetric data in certain retinal imaging devices. More specifically, we first build a structure-guided depth prediction network which incorporates a domain adaptation module to evaluate the depth maps obtained from different OCTA imaging devices. A point-cloud-to-surface reconstruction method is then utilized to reconstruct the corresponding 3D retinal vessels, based on the predicted depth maps and 2D vascular information. Experimental results demonstrate the superior performance of our method in comparison to existing state-of-the-art techniques. Furthermore, we extract 3D vessel-related features to assess disease correlation and classification, effectively evaluating the potential of our method for guiding subsequent clinical analysis. The results show promise of exploring 3D microvascular analysis for early diagnosis of various eye-related diseases.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111494"},"PeriodicalIF":7.5,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143680923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Domain consistency learning for continual test-time adaptation in image semantic segmentation
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-17 DOI: 10.1016/j.patcog.2025.111585
Yanyu Ye , Wei Wei , Lei Zhang , Chen Ding , Yanning Zhang
{"title":"Domain consistency learning for continual test-time adaptation in image semantic segmentation","authors":"Yanyu Ye ,&nbsp;Wei Wei ,&nbsp;Lei Zhang ,&nbsp;Chen Ding ,&nbsp;Yanning Zhang","doi":"10.1016/j.patcog.2025.111585","DOIUrl":"10.1016/j.patcog.2025.111585","url":null,"abstract":"<div><div>In the open-world scenario, the challenge of distribution shift persists. Test-time adaptation adjusts the model during test-time to fit the target domain’s data, addressing the distribution shift between the source and target domains. However, test-time adaptation methods still face significant challenges with continuously changing data distributions, especially since there are few methods applicable to continual test-time adaptation in image semantic segmentation. Furthermore, inconsistent semantic representations across different domains result in catastrophic forgetting in continual test-time adaptation. This paper focuses on the problem of continual test-time adaptation in semantic segmentation tasks and proposes a method named domain consistency learning for continual test-time adaptation. We mitigate catastrophic forgetting through feature-level and prediction-level consistency learning. Specifically, we propose domain feature consistency learning and class awareness consistency learning to guide model learning, enabling the target domain model to extract generalized knowledge. Additionally, to mitigate error accumulation, we propose a novel value-based sample selection method that jointly considers the pseudo-label confidence and style representativeness of the test images. Extensive experiments on widely-used semantic segmentation benchmarks demonstrate that our approach achieves satisfactory performance compared to state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"165 ","pages":"Article 111585"},"PeriodicalIF":7.5,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143643868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信