Pattern Recognition最新文献

筛选
英文 中文
Low-redundancy distillation for continual learning 用于持续学习的低冗余蒸馏
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-28 DOI: 10.1016/j.patcog.2025.111712
Ruiqi Liu , Boyu Diao , Libo Huang , Zijia An , Hangda Liu , Zhulin An , Yongjun Xu
{"title":"Low-redundancy distillation for continual learning","authors":"Ruiqi Liu ,&nbsp;Boyu Diao ,&nbsp;Libo Huang ,&nbsp;Zijia An ,&nbsp;Hangda Liu ,&nbsp;Zhulin An ,&nbsp;Yongjun Xu","doi":"10.1016/j.patcog.2025.111712","DOIUrl":"10.1016/j.patcog.2025.111712","url":null,"abstract":"<div><div>Continual learning (CL) aims to learn new tasks without erasing previous knowledge. However, current CL methods primarily emphasize improving accuracy while often neglecting training efficiency, which consequently restricts their practical application. Drawing inspiration from the brain’s contextual gating mechanism, which selectively filters neural information and continuously updates past memories, we propose Low-redundancy Distillation (LoRD), a novel CL method that enhances model performance while maintaining training efficiency. This is achieved by eliminating redundancy in three aspects of CL: student model redundancy, teacher model redundancy, and rehearsal sample redundancy. By compressing the learnable parameters of the student model and pruning the teacher model, LoRD facilitates the retention and optimization of prior knowledge, effectively decoupling task-specific knowledge without manually assigning isolated parameters for each task. Furthermore, we optimize the selection of rehearsal samples and refine rehearsal frequency to improve training efficiency. Through a meticulous design of distillation and rehearsal strategies, LoRD effectively balances training efficiency and model precision. Extensive experimentation across various benchmark datasets and environments demonstrates LoRD’s superiority, achieving the highest accuracy with the lowest training FLOPs.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111712"},"PeriodicalIF":7.5,"publicationDate":"2025-04-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143885927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dual-branch adjacent connection and channel mixing network for video crowd counting 用于视频人群计数的双支路相邻连接和信道混频器网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-26 DOI: 10.1016/j.patcog.2025.111709
Miaogen Ling , Jixuan Chen , Yongwen Liu , Wei Fang , Xin Geng
{"title":"Dual-branch adjacent connection and channel mixing network for video crowd counting","authors":"Miaogen Ling ,&nbsp;Jixuan Chen ,&nbsp;Yongwen Liu ,&nbsp;Wei Fang ,&nbsp;Xin Geng","doi":"10.1016/j.patcog.2025.111709","DOIUrl":"10.1016/j.patcog.2025.111709","url":null,"abstract":"<div><div>This paper focuses on the problem of video crowd counting, which usually uses the spatial and temporal correlations of the consecutive frames to achieve better performance than the single-image crowd counting methods. However, most of the current video crowd counting methods either use only two or three frames for optical flow or frame-difference feature extraction or construct a single-branch network to extract spatiotemporal correlated features. The interactions of features for multiple adjacent frames, which can effectively prevent disturbances caused by background noise, are mostly overlooked. Considering the above problems, we propose a dual-branch adjacent connection and channel mixing network for multi-frame video crowd counting. For the upper branch, an adjacent layer connection method is proposed to capture the multi-scaled spatiotemporal correlations among multiple consecutive frames instead of the traditional dense connections in decomposed 3D convolutional blocks. It achieves better performance and low computation cost. For the lower branch, adaptive temporal channel mixing blocks are proposed to exchange partial channel information among the adjacent frames for feature interaction. The partial channel transpose operation is first proposed to exchange information. It is parameter-free and flexible to achieve interactions among features of any number of consecutive frames. The proposed method outperforms the current image-based and video-based crowd counting models, achieving state-of-the-art performance on six publicly available datasets. The code is available at: <span><span>https://github.com/aaaabbbbcccccjxzxj/mfvcc</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111709"},"PeriodicalIF":7.5,"publicationDate":"2025-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Crack segmentation network via difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention 基于差分卷积编码器和混合CNN-Mamba多尺度注意力的裂缝分割网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-25 DOI: 10.1016/j.patcog.2025.111723
Jianming Zhang , Shigen Zhang , Dianwen Li , Jianxin Wang , Jin Wang
{"title":"Crack segmentation network via difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention","authors":"Jianming Zhang ,&nbsp;Shigen Zhang ,&nbsp;Dianwen Li ,&nbsp;Jianxin Wang ,&nbsp;Jin Wang","doi":"10.1016/j.patcog.2025.111723","DOIUrl":"10.1016/j.patcog.2025.111723","url":null,"abstract":"<div><div>Cracks are the most common pavement defects, and failure to rehabilitate them promptly can lead to more severe road damage. Due to the thin, long, and irregular nature of cracks, their precise measurement remains a challenge. To tackle these issues, a network (DCCM-Net) via the difference convolution-based encoder and hybrid CNN-Mamba multi-scale attention is proposed. First, an enhanced convolution module (ECM) is designed as a core component of the encoder to extract edge information of cracks. Besides our proposed diagonal difference convolution operator, the ECM uses five types of difference convolution operator to capture edge information of crack images, respectively along the horizontal, vertical, and diagonal directions in the Cartesian coordinate system, as well as the polar axis and polar angle directions in the polar coordinate system. Second, to overcome the limitation of the ECM-based encoder only extracting local features, a mixed convolution and Mamba (MixConv-Mamba) attention module for skip-connection is proposed. This module uses multi-scale depthwise separable convolutions to extract rich spatial features and effectively extracts global features using the Mamba block. The obtained features are processed using self-attention mechanisms, and the final features can capture spatial and long-range dependencies. Third, a feature fusion module (FFM) is introduced as a key component of the decoder to fuse the deep-layer and shallow-layer features effectively. Finally, in comparison with nine advanced networks across three public datasets—CrackTree260, DeepCrack, and CrackForest—our network demonstrates superior performance, achieving mean intersection over union (mIoU) scores of 84.92%, 86.60%, and 80.61%, respectively.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111723"},"PeriodicalIF":7.5,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PRGS: Patch-to-Region Graph Search for Visual Place Recognition 用于视觉位置识别的Patch-to-Region图搜索
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-24 DOI: 10.1016/j.patcog.2025.111673
Weiliang Zuo, Liguo Liu, Yizhe Li, Yanqing Shen, Fuhua Xiang, Jingmin Xin, Nanning Zheng
{"title":"PRGS: Patch-to-Region Graph Search for Visual Place Recognition","authors":"Weiliang Zuo,&nbsp;Liguo Liu,&nbsp;Yizhe Li,&nbsp;Yanqing Shen,&nbsp;Fuhua Xiang,&nbsp;Jingmin Xin,&nbsp;Nanning Zheng","doi":"10.1016/j.patcog.2025.111673","DOIUrl":"10.1016/j.patcog.2025.111673","url":null,"abstract":"<div><div>Visual Place Recognition (VPR) is a task to estimate the target location based on visual information in changing scenarios, which usually uses a two-stage strategy of global retrieval and reranking. Existing reranking methods in VPR establish a single correspondence between the query image and the candidate images for reranking, which almost overlooks the neighbor correspondences in retrieved candidate images that can help to enhance reranking. In this paper, we propose a <strong>P</strong>atch-to-<strong>R</strong>egion <strong>G</strong>raph <strong>S</strong>earch (PRGS) method to enhance reranking using neighbor correspondences in candidate images. Firstly, considering that searching for neighbor correspondences relies on important features, we design a <strong>P</strong>atch-to-<strong>R</strong>egion (PR) module, which aggregates patch level features into region level features for highlighting important features. Secondly, to estimate the candidate image reranking score using the neighbor correspondences, we design a <strong>G</strong>raph <strong>S</strong>earch (GS) module, which establishes the neighbor correspondences among all candidates and query images in graph space. What is more, PRGS integrates well with both CNN and transformer backbone. We achieve competitive performance on several benchmarks, offering a 64% improvement in matching time and approximately 59% reduction in FLOPs compared to state-of-the-art methods. The code is released at <span><span>https://github.com/LKELN/PRGS</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"166 ","pages":"Article 111673"},"PeriodicalIF":7.5,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874965","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enhancing robustness and sparsity: Least squares one-class support vector machine 增强鲁棒性和稀疏性:最小二乘一类支持向量机
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-24 DOI: 10.1016/j.patcog.2025.111691
Anuradha Kumari, M. Tanveer
{"title":"Enhancing robustness and sparsity: Least squares one-class support vector machine","authors":"Anuradha Kumari,&nbsp;M. Tanveer","doi":"10.1016/j.patcog.2025.111691","DOIUrl":"10.1016/j.patcog.2025.111691","url":null,"abstract":"<div><div>In practical applications, identifying data points that deviate from general patterns, known as one-class classification (OCC), is crucial. The least squares one-class support vector machine (LS-OCSVM) is effective for OCC; however, it has limitations: it is sensitive to outliers and noise, and its non-sparse formulation restricts scalability. To address these challenges, we introduce two novel models: the robust least squares one-class support vector machine (RLS-1SVM) and the sparse robust least squares one-class support vector machine (SRLS-1SVM). RLS-1SVM improves robustness by minimizing both mean and variance of modeling errors, and integrating distribution information to mitigate random noise. SRLS-1SVM introduces sparsity by applying the representer theorem and pivoted Cholesky decomposition, marking the first sparse LS-OCSVM adaptation for batch learning. The proposed models exhibit robust empirical and theoretical strengths, with established upper bounds on both empirical and generalization errors. Evaluations on UCI and CIFAR-10 dataset show that RLS-1SVM and SRLS-1SVM deliver superior performance with faster training/testing times. The codes of the proposed models are available at <span><span>https://github.com/mtanveer1/RLS-1SVM</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111691"},"PeriodicalIF":7.5,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143886051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sketch-SparseNet: Sparse convolution framework for sketch recognition sketch - sparsenet:用于素描识别的稀疏卷积框架
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-24 DOI: 10.1016/j.patcog.2025.111682
Jingru Yang , Jin Wang , Yang Zhou , Guodong Lu , Yu Sun , Huan Yu , Heming Fang , Zhihui Li , Shengfeng He
{"title":"Sketch-SparseNet: Sparse convolution framework for sketch recognition","authors":"Jingru Yang ,&nbsp;Jin Wang ,&nbsp;Yang Zhou ,&nbsp;Guodong Lu ,&nbsp;Yu Sun ,&nbsp;Huan Yu ,&nbsp;Heming Fang ,&nbsp;Zhihui Li ,&nbsp;Shengfeng He","doi":"10.1016/j.patcog.2025.111682","DOIUrl":"10.1016/j.patcog.2025.111682","url":null,"abstract":"<div><div>In free-hand sketch recognition, state-of-the-art methods often struggle to extract spatial features from sketches with sparse distributions, which are characterized by significant blank regions devoid of informative content. To address this challenge, we introduce a novel framework for sketch recognition, termed <em>Sketch-SparseNet</em>. This framework incorporates an advanced convolutional component: the Sketch-Driven Dilated Deformable Block (<span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span>). This component excels at extracting spatial features and accurately recognizing free-hand sketches with sparse distributions. The <span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span> component innovatively bridges gaps in the blank areas of sketches by establishing spatial relationships among disconnected stroke points through adaptive reshaping of convolution kernels. These kernels are deformable, dilatable, and dynamically positioned relative to the sketch strokes, ensuring the preservation of spatial information from sketch points. Consequently, <em>Sketch-SparseNet</em> extracts a more accurate and compact representation of spatial features, enhancing sketch recognition performance. Additionally, we introduce the SmoothAlign loss function, which minimizes the disparity between the output features of parallel <span><math><mrow><mi>S</mi><msup><mrow><mi>D</mi></mrow><mrow><mn>3</mn></mrow></msup><mi>B</mi></mrow></math></span> and CNNs, facilitating effective feature fusion. Extensive evaluations on the QuickDraw-414k and TU-Berlin datasets highlight our method’s state-of-the-art performance, achieving accuracies of 79.52% and 85.78%, respectively. To our knowledge, this work represents the first application of a sparse convolution framework that substantially alleviates the adverse effects of sparse sketch points. The codes are available at <span><span>https://github.com/kingbackyang/Sketch-SparseNet</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111682"},"PeriodicalIF":7.5,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143890575","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ChatSearch: A dataset and a generative retrieval model for general conversational image retrieval ChatSearch:用于一般会话图像检索的数据集和生成检索模型
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-23 DOI: 10.1016/j.patcog.2025.111696
Zijia Zhao , Longteng Guo , Tongtian Yue , Erdong Hu , Shuai Shao , Zehuan Yuan , Hua Huang , Jing Liu
{"title":"ChatSearch: A dataset and a generative retrieval model for general conversational image retrieval","authors":"Zijia Zhao ,&nbsp;Longteng Guo ,&nbsp;Tongtian Yue ,&nbsp;Erdong Hu ,&nbsp;Shuai Shao ,&nbsp;Zehuan Yuan ,&nbsp;Hua Huang ,&nbsp;Jing Liu","doi":"10.1016/j.patcog.2025.111696","DOIUrl":"10.1016/j.patcog.2025.111696","url":null,"abstract":"<div><div>In this paper, we investigate the task of general conversational image retrieval on open-domain images. The objective is to search for images based on interactive conversations between humans and computers. To advance this task, we curate a dataset called ChatSearch. This dataset includes a <strong>multi-round multimodal conversational context query</strong> for each target image, thereby requiring the retrieval system to find the accurate image from database. Simultaneously, we propose a generative retrieval model named ChatSearcher, which is trained end-to-end to accept/produce interleaved image–text inputs/outputs. ChatSearcher exhibits strong capability in reasoning with multimodal context and can leverage world knowledge to yield visual retrieval results. It demonstrates superior performance on the ChatSearch dataset and also achieves competitive results on other image retrieval tasks and visual conversation tasks. We anticipate that this work will inspire further research on interactive multimodal retrieval systems.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111696"},"PeriodicalIF":7.5,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143885928","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cross-Modality Interactive Attention Network for AI-generated image quality assessment 人工智能生成图像质量评估的跨模态交互关注网络
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-23 DOI: 10.1016/j.patcog.2025.111693
Tianwei Zhou , Songbai Tan , Leida Li , Baoquan Zhao , Qiuping Jiang , Guanghui Yue
{"title":"Cross-Modality Interactive Attention Network for AI-generated image quality assessment","authors":"Tianwei Zhou ,&nbsp;Songbai Tan ,&nbsp;Leida Li ,&nbsp;Baoquan Zhao ,&nbsp;Qiuping Jiang ,&nbsp;Guanghui Yue","doi":"10.1016/j.patcog.2025.111693","DOIUrl":"10.1016/j.patcog.2025.111693","url":null,"abstract":"<div><div>Recently, AI-generative techniques have revolutionized image creation, prompting the need for AI-generated image (AGI) quality assessment. This paper introduces CIA-Net, a Cross-modality Interactive Attention Network, for blind AGI quality evaluation. Using a multi-task framework, CIA-Net processes text and image inputs to output consistency, visual quality, and authenticity scores. Specifically, CIA-Net first encodes two-modal data to obtain textual and visual embeddings. Next, for consistency score prediction, it computes the similarity between these two kinds of embeddings in view of that text-to-image alignment. For visual quality prediction, it fuses textural and visual embeddings using a well-designed cross-modality interactive attention module. For authenticity score prediction, it constructs a textural template that contains authenticity labels and computes the joint probability from the similarity between the textural embeddings of each element and the visual embeddings. Experimental results show that CIA-Net is more competent for the AGI quality assessment task than 11 state-of-the-art competing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111693"},"PeriodicalIF":7.5,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143873287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Frequency-Aware Self-Supervised Group Activity Recognition with skeleton sequences 基于骨架序列的频率感知自监督群体活动识别
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-21 DOI: 10.1016/j.patcog.2025.111710
Guoquan Wang, Mengyuan Liu, Hong Liu, Jinyan Zhang, Peini Guo, Ruijia Fan, Siyu Chen
{"title":"Frequency-Aware Self-Supervised Group Activity Recognition with skeleton sequences","authors":"Guoquan Wang,&nbsp;Mengyuan Liu,&nbsp;Hong Liu,&nbsp;Jinyan Zhang,&nbsp;Peini Guo,&nbsp;Ruijia Fan,&nbsp;Siyu Chen","doi":"10.1016/j.patcog.2025.111710","DOIUrl":"10.1016/j.patcog.2025.111710","url":null,"abstract":"<div><div>Self-supervised, skeleton-based techniques have recently demonstrated great potential for group activity recognition via contrastive learning. However, these methods have difficulty accommodating the dynamic and complex nature of spatio-temporal data, weakening the ability to conduct effective modeling and extract crucial features. To this end, we propose a novel <strong>F</strong>requency-<strong>A</strong>ware <strong>G</strong>roup <strong>A</strong>ctivity <strong>R</strong>ecognition (FAGAR) network, which offers a comprehensive solution by addressing three key subproblems. First, the challenge of extracting discriminative features is further exacerbated by pose estimation algorithms’ limitations under random spatio-temporal data augmentation. To mitigate this, a frequency domain passing augmentation method that emphasizes individual collaborative changes is introduced, effectively filtering out noise interference. Second, the fixed connections in traditional relation modeling networks fail to adapt to dynamic scene changes. To address this, we design an adaptive frequency domain compression network, which dynamically adjusts to scene variations. Third, the temporal modeling process often leads to a loss of focus on key features, reducing the model’s ability to assess individual contributions within a group. To resolve this, we propose an amplitude-aware loss function that guides the network in learning the relative importance of individuals, ensuring it maintains the correct learning direction. Our FAGAR achieves state-of-the-art performance on several datasets for self-supervised skeleton-based group activity recognition. Code is available at <span><span>https://github.com/WGQ109/FAGAR</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111710"},"PeriodicalIF":7.5,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A novel federated learning framework for semantic segmentation of terminal block in smart substation 一种新的智能变电站终端块语义分割联邦学习框架
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-04-21 DOI: 10.1016/j.patcog.2025.111665
Rong Xie , Zhong Chen , Weiguo Cao , Congying Wu , Tiecheng Li
{"title":"A novel federated learning framework for semantic segmentation of terminal block in smart substation","authors":"Rong Xie ,&nbsp;Zhong Chen ,&nbsp;Weiguo Cao ,&nbsp;Congying Wu ,&nbsp;Tiecheng Li","doi":"10.1016/j.patcog.2025.111665","DOIUrl":"10.1016/j.patcog.2025.111665","url":null,"abstract":"<div><div>Recent advancements in computer vision have significantly enhanced the intelligence operation and maintenance of substation equipment. In this paper, we advance this progress and focus on semantic segmentation of secondary screen cabinet terminal blocks in substations. We note that existing schemes are centralized, which may be unscalable, and more importantly, may be very difficult to protect data privacy. In response, we develop a novel semantic segmentation framework based on federated learning. This framework includes a federated learning system composed of a trusted third party, a cloud server, multiple power stations, and substations across various regions. To ensure substation security, our design incorporates anonymous identity verification managed by the trusted third party and other participants. Local substations then employ the designed semantic segmentation model to extract data and model elements through cameras and store them in distributed power stations. To address data heterogeneity in distributed semantic segmentation, we design a diffusion model for data augmentation and improve the feature similarity loss, which helps mitigate the local optima and enhance the global generalization capability of the final model. Experiments conducted using real data from multiple substations have demonstrated that our framework achieves an intelligent terminal block recognition system with an accuracy of 93.41% and mIoU of 81.37%.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"167 ","pages":"Article 111665"},"PeriodicalIF":7.5,"publicationDate":"2025-04-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143883162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信