Pattern Recognition最新文献

筛选
英文 中文
Learning multi-granularity representation with transformer for visible-infrared person re-identification
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-04 DOI: 10.1016/j.patcog.2025.111510
Yujian Feng , Feng Chen , Guozi Sun , Fei Wu , Yimu Ji , Tianliang Liu , Shangdong Liu , Xiao-Yuan Jing , Jiebo Luo
{"title":"Learning multi-granularity representation with transformer for visible-infrared person re-identification","authors":"Yujian Feng ,&nbsp;Feng Chen ,&nbsp;Guozi Sun ,&nbsp;Fei Wu ,&nbsp;Yimu Ji ,&nbsp;Tianliang Liu ,&nbsp;Shangdong Liu ,&nbsp;Xiao-Yuan Jing ,&nbsp;Jiebo Luo","doi":"10.1016/j.patcog.2025.111510","DOIUrl":"10.1016/j.patcog.2025.111510","url":null,"abstract":"<div><div>Visible-infrared person re-identification (VI-ReID) aims to match pedestrian images from visible and near-infrared modalities. The pedestrian images of two modalities contain discriminative features in different sizes and positions, <em>e.g.</em>, the global color of the cloth, the body’s local pose, and the shoe’s pixel size. However, existing methods mainly capture features at a single granularity, ignoring multi-granularity information contributing to pedestrian identification. Therefore, we propose a cross-modality multi-granularity Transformer (CM<sup>2</sup>GT) framework to solve this issue. CM<sup>2</sup>GT learns coarse-to-fine feature representations and integrates discriminative information across various granularities, which alleviates problems of the irrelevant matching and ambiguous alignment caused by matching single granularity features. Specifically, we first design a multi-granularity feature extractor (MGFE) module based on Transformer to capture the global-patch-pixel level features of each modality, which can flexibly represent semantic information at multiple scales. Secondly, a multi-granularity fusion Transformer (MGFT) module mines the hierarchical relationships between multi-granularity features by a saliency-enhanced Transformer, which ensures the identity-wise saliency consistency across different granularities and modalities. Furthermore, to further enhance cross-modality intra-class clustering in latent space, we design a cross-modality nearest-neighbor clustering (CNC) loss function to minimize the distance between the anchor sample and its cross-modality nearest neighbor. Extensive experiments demonstrate that our approach outperforms state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111510"},"PeriodicalIF":7.5,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TSTKD: Triple-spike train kernel-driven supervised learning algorithm
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-04 DOI: 10.1016/j.patcog.2025.111525
Guojun Chen , Guoen Wang
{"title":"TSTKD: Triple-spike train kernel-driven supervised learning algorithm","authors":"Guojun Chen ,&nbsp;Guoen Wang","doi":"10.1016/j.patcog.2025.111525","DOIUrl":"10.1016/j.patcog.2025.111525","url":null,"abstract":"<div><div>Precise artificial intelligence is one of the most promising research fields, where supervised learning for spiking neurons (SNs) plays an imperative and fundamental role. This study proposes a novel supervised learning algorithm based on triple-spike train kernels to address the shortcomings of the latest learning algorithms, such as local best learning and low learning accuracy. First, we divided the time intervals of the spike trains, including the firing time of the input spikes. Subsequently, we discovered and analyzed the relationship between the firing times of all spikes, added a third spike to solve the existing problem, and constructed a triple-spike-driven (TSD) minimum direct computational unit. In addition to the simple and efficient adjustment of synaptic weights based on pair-spike, TSD maintains a relationship between all useful spikes to approximate the global best learning. Finally, we proposed a triple-spike train kernel driven (TSTKD) supervised learning algorithm to improve the learning performance. Many fundamental experiments were implemented to demonstrate the learning performance, which proved that the successful learning ability and some learning factors of our proposed algorithm in spike train learning. We then verified the positive effect of the TSD on the proposed algorithm. Many experiments also proved the much higher learning accuracy of the proposed state-of-the-art algorithm compared to some of the latest algorithms, especially in the complex spike train learning. In addition, the proposed algorithm is more adaptive to SNs and much better at generalizing, memorizing, and classifying than the corresponding algorithm with pair-spike and some of the latest algorithms. Considering the above experimental results, our study blazes a trail for pattern recognition using spike train supervised learning with global optimization.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111525"},"PeriodicalIF":7.5,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Learning to Restore Arbitrary Hybrid adverse weather Conditions in one go
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-04 DOI: 10.1016/j.patcog.2025.111504
Yecong Wan, Mingwen Shao, Yuanshuo Cheng, Yuexian Liu, Zhiyuan Bao
{"title":"Learning to Restore Arbitrary Hybrid adverse weather Conditions in one go","authors":"Yecong Wan,&nbsp;Mingwen Shao,&nbsp;Yuanshuo Cheng,&nbsp;Yuexian Liu,&nbsp;Zhiyuan Bao","doi":"10.1016/j.patcog.2025.111504","DOIUrl":"10.1016/j.patcog.2025.111504","url":null,"abstract":"<div><div>Adverse conditions typically suffer from stochastic hybrid weather degradations (e.g., rainy and hazy night), while existing image restoration algorithms envisage that weather degradations occur independently, thus may fail to handle real-world complicated scenarios. Besides, supervised training is not feasible due to the lack of comprehensive paired dataset to characterize hybrid weather conditions. To this end, we have advanced the forementioned limitations with two tactics: framework and data. First, we present a novel unified framework, dubbed RAHC, to Restore Arbitrary Hybrid adverse weather Conditions in one go. Specifically, our RAHC leverages a multi-head aggregation architecture to learn multiple degradation representation subspaces and then constrains the network to flexibly handle multiple hybrid adverse weather in a unified paradigm through a discrimination mechanism in the output space. Furthermore, we devise a reconstruction vectors aided scheme to provide auxiliary visual content cues for reconstruction, thus can comfortably cope with hybrid scenarios with insufficient remaining image constituents. Second, we establish a new dataset, termed HAC, for learning and benchmarking arbitrary Hybrid Adverse Conditions restoration. HAC contains 31 scenarios composed of an arbitrary fusion of five common weather, with a total of <span><math><mrow><mo>∼</mo><mspace></mspace><mn>316</mn><mi>K</mi></mrow></math></span> adverse-weather/clean pairs. As for fabrication, the training set is automatically generated by a dedicated AdverseGAN with no-frills labor, while the test set is manually modulated by experts for authoritative evaluation. Extensive experiments yield superior results and in particular establish new state-of-the-art results on both HAC and conventional datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111504"},"PeriodicalIF":7.5,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552635","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mask-Aware 3D axial transformer for video inpainting
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-04 DOI: 10.1016/j.patcog.2025.111509
Hongyi Sun, Wanhua Li, Jiwen Lu, Jie Zhou
{"title":"Mask-Aware 3D axial transformer for video inpainting","authors":"Hongyi Sun,&nbsp;Wanhua Li,&nbsp;Jiwen Lu,&nbsp;Jie Zhou","doi":"10.1016/j.patcog.2025.111509","DOIUrl":"10.1016/j.patcog.2025.111509","url":null,"abstract":"<div><div>In this paper, we propose a Mask-Aware 3D Axial Transformer for efficient and effective video inpainting, which aims to recover the missing content of a video by leveraging long-range information in an efficient way. Recent works show that the transformer architecture achieves promising video inpainting performance, due to its powerful capability to exploit long-range consistency across frames. However, it requires high time complexity to compute the global self-attention. On the other hand, existing transformer-based inpainting methods treat the valid and invalid regions in the masked image the same when calculating self-attention, causing the network not to distinguish their differences. To address these issues, we first design a 3D Axial Transformer which splits the input features into three shapes of stripes, including a horizontal stripe, a vertical stripe to perform intra-frame attention, and a temporal stripe for inter-frame attention. With three such transformer blocks stacked, the relevance between two arbitrary spatial–temporal pixels across all video frames can be reached while maintaining high efficiency. We also devise a mask-aware module to predict the reliability score of masked pixels, which helps the transformer avoid leveraging information from the invalid region. Extensive experimental results on the Youtube-VOS and DAVIS datasets show that our approach outperforms the state-of-the-arts.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111509"},"PeriodicalIF":7.5,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
CGViT: Cross-image GroupViT for zero-shot semantic segmentation
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-03 DOI: 10.1016/j.patcog.2025.111505
Jie Jiang , Xingjian He , Xinxin Zhu , Weining Wang , Jing Liu
{"title":"CGViT: Cross-image GroupViT for zero-shot semantic segmentation","authors":"Jie Jiang ,&nbsp;Xingjian He ,&nbsp;Xinxin Zhu ,&nbsp;Weining Wang ,&nbsp;Jing Liu","doi":"10.1016/j.patcog.2025.111505","DOIUrl":"10.1016/j.patcog.2025.111505","url":null,"abstract":"<div><div>Recently, with the increase of image-text data, such coarse data has also been introduced to address the image semantic segmentation task. However, previous works simply transfer the methods used in other visual tasks to image semantic segmentation, ignoring the task characteristics of semantic segmentation. In this work, we propose a <strong>C</strong>ross-image <strong>G</strong>roup<strong>ViT</strong> (CGViT) for zero-shot semantic segmentation, constructing a semantically consistent feature representation across images. Specifically, we improve the previous work GroupViT in two aspects. We propose two grouping blocks and update them with a momentum-based method, constructing a semantically consistent feature representation across images. Then we introduce an image-level supervision for learning semantic information and a token-level supervision for fine-grained information, obtaining hierarchical information for semantic segmentation. We train the model with image-text data and transfer it to zero-shot semantic segmentation without fine-tuning. Furthermore, the CGViT achieves new state-of-the-art results on three challenging datasets. Especially, the CGViT obtains 49.30% in mIoU on PASCAL VOC dataset, when only pre-trained on CC12M dataset.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111505"},"PeriodicalIF":7.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592927","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Iterative knowledge distillation and pruning for model compression in unsupervised domain adaptation
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-03 DOI: 10.1016/j.patcog.2025.111512
Zhiyuan Wang , Long Shi , Zhen Mei , Xiang Zhao , Zhe Wang , Jun Li
{"title":"Iterative knowledge distillation and pruning for model compression in unsupervised domain adaptation","authors":"Zhiyuan Wang ,&nbsp;Long Shi ,&nbsp;Zhen Mei ,&nbsp;Xiang Zhao ,&nbsp;Zhe Wang ,&nbsp;Jun Li","doi":"10.1016/j.patcog.2025.111512","DOIUrl":"10.1016/j.patcog.2025.111512","url":null,"abstract":"<div><div>In practical applications, deep learning models often face the challenges of inconsistent distribution between training data and test data and insufficient labeled data. To address these problems, unsupervised domain adaptation (UDA) based transfer learning has gained significant attention. However, the existing UDA models are difficult to meet the requirements of real-time and resource-constrained scenarios. Although model compression can accelerate UDA, it usually leads to performance degradation. In this paper, we propose an iterative transfer model compression (ITMC) method, which centers on two key modules, i.e., transfer knowledge distillation (TKD) and adaptive channel pruning (ACP), by executing them alternately. The tight coupling of the two modules realizes the effective compression of the model while ensuring the performance of the model on the target domain. In the TKD phase, the teacher model and the student model are gradually adapted to the target domain, and the real-time updated teacher model efficiently guides the student model learning, while the ACP phase employs a dynamic pruning strategy based on the training epoch, which removes unimportant channels based on the loss of the TKD student model. Experimental results demonstrate that ITMC approach achieves higher accuracy under the same compression ratio compared with the state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111512"},"PeriodicalIF":7.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143576954","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Deep reinforcement learning for efficient registration between intraoral-scan meshes and CT images
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-03 DOI: 10.1016/j.patcog.2025.111502
Seungpil Choi , Seoyeon Jang , Sunghee Jung , Heon Jae Cho , Byunghwan Jeon
{"title":"Deep reinforcement learning for efficient registration between intraoral-scan meshes and CT images","authors":"Seungpil Choi ,&nbsp;Seoyeon Jang ,&nbsp;Sunghee Jung ,&nbsp;Heon Jae Cho ,&nbsp;Byunghwan Jeon","doi":"10.1016/j.patcog.2025.111502","DOIUrl":"10.1016/j.patcog.2025.111502","url":null,"abstract":"<div><div>Registration between computed tomography (CT) images and intraoral-scan (IOS) meshes facilitates dental procedure planning. However, the spatial complexity of 3D-space computations presents a significant challenge, necessitating the reduction of computational cost through efficient sampling while maintaining robustness via global approximation without segmentation. Herein, we introduce an efficient and robust method for registering CT images and IOS meshes, eliminating the need for segmentation. We utilized an effective sampling technique to identify key vertices in IOS meshes by calculating the negative curvatures between adjacent faces. The significant vertices are transformed into a novel graph representation, serving as the input state for the graph convolution-based backbone network within a deep reinforcement learning (DRL) framework. This framework approximates an optimal solution through sequential decision-making, selecting the best among 12 actions by considering translation and rotation to accurately locate the 3D mesh at arbitrary positions and angles on maxillary or mandibular teeth in CT images. The proposed method was evaluated against conventional and deep learning-based methods, demonstrating mean absolute errors of 1.955 ± 1.310 and 1.399 ± 0.644 mm for maxillary and mandibular teeth, respectively. Additionally, it required only 0.48 M floating-point operations for the calculations, making it more efficient than existing methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111502"},"PeriodicalIF":7.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143552688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Attention-based vector quantized variational autoencoder for anomaly detection by using orthogonal subspace constraints
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-03 DOI: 10.1016/j.patcog.2025.111500
Qien Yu , Shengxin Dai , Ran Dong , Soichiro Ikuno
{"title":"Attention-based vector quantized variational autoencoder for anomaly detection by using orthogonal subspace constraints","authors":"Qien Yu ,&nbsp;Shengxin Dai ,&nbsp;Ran Dong ,&nbsp;Soichiro Ikuno","doi":"10.1016/j.patcog.2025.111500","DOIUrl":"10.1016/j.patcog.2025.111500","url":null,"abstract":"<div><div>This paper introduces a new framework that uses a vector quantized variational autoencoder (VQVAE) enhanced by orthogonal subspace constraints (OSC) and pyramid criss-cross attention (PCCA). The framework was designed for anomaly detection in industrial product image datasets. Previous studies on modeling low-dimensional feature distributions have been unable to effectively distinguish between normal features and noisy/abnormal information, which is effectively addressed using OSC in this study. Then, the vector quantized mechanism is embodied in these two complementary subspaces to obtain normal and abnormal embedding subspaces and discrete representations for normal and noisy information, respectively. The proposed approach robustly represents low-dimensional discrete manifolds to present the information from normal data using a limited number of feature vectors. Additionally, two PCCA modules are proposed to capture feature maps from different layers in the encoder and decoder, benefitting the low-dimensional mapping and reconstruction process. The features of different layers are treated as the query (Q), key (K), and value (V), which could capture both low-level and high-level features, incorporating comprehensive contextual information. The effectiveness of the proposed framework for anomaly detection is assessed by comparing its performance with those of the state-of-the-art approaches on various publicly available industrial product image datasets.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111500"},"PeriodicalIF":7.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143610879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GAD: Domain generalized diabetic retinopathy grading by grade-aware de-stylization
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-03 DOI: 10.1016/j.patcog.2025.111484
Qi Bi , Jingjun Yi , Hao Zheng , Haolan Zhan , Yawen Huang , Wei Ji , Yuexiang Li , Yefeng Zheng
{"title":"GAD: Domain generalized diabetic retinopathy grading by grade-aware de-stylization","authors":"Qi Bi ,&nbsp;Jingjun Yi ,&nbsp;Hao Zheng ,&nbsp;Haolan Zhan ,&nbsp;Yawen Huang ,&nbsp;Wei Ji ,&nbsp;Yuexiang Li ,&nbsp;Yefeng Zheng","doi":"10.1016/j.patcog.2025.111484","DOIUrl":"10.1016/j.patcog.2025.111484","url":null,"abstract":"<div><div>Diabetic retinopathy (DR) is a prevalent complication of diabetes that can result in vision impairment and blindness, making accurate DR grading essential for early diagnosis and treatment. Most existing DR grading methods assume that the training and test images share the same distribution. However, the generalization performance on unseen target domains has not been adequately addressed. In this paper, we observe that images from the same domain tend to cluster together in the feature space, rather than images of the same grade. This is largely due to the fact that when the representation of lesions is influenced by style variations, the network tends to remember features of different image domains through separate channels. This phenomenon significantly impacts the generalization capability of deep learning models. To address this issue, we propose a global-aware channel similarity to reduce the influence of lesion position and size when measuring the distance in the feature space. This is further utilized in a grade-aware contrastive learning approach, which guides the learning of domain-invariant features by mapping images of the same grade into a compact subspace. Additionally, we develop a multi-scale de-stylization method to explicitly eliminate style information from the features, which also compels the model to exploit diverse representations of the lesion. Extensive experiments on multiple DR grading datasets show the state-of-the-art generalization ability of the proposed method.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111484"},"PeriodicalIF":7.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Progressive class-aware instance enhancement for aircraft detection in remote sensing imagery
IF 7.5 1区 计算机科学
Pattern Recognition Pub Date : 2025-03-03 DOI: 10.1016/j.patcog.2025.111503
Tianjun Shi, Jinnan Gong, Jianming Hu, Yu Sun, Guangzhen Bao, Pengfei Zhang, Junjie Wang, Xiyang Zhi, Wei Zhang
{"title":"Progressive class-aware instance enhancement for aircraft detection in remote sensing imagery","authors":"Tianjun Shi,&nbsp;Jinnan Gong,&nbsp;Jianming Hu,&nbsp;Yu Sun,&nbsp;Guangzhen Bao,&nbsp;Pengfei Zhang,&nbsp;Junjie Wang,&nbsp;Xiyang Zhi,&nbsp;Wei Zhang","doi":"10.1016/j.patcog.2025.111503","DOIUrl":"10.1016/j.patcog.2025.111503","url":null,"abstract":"<div><div>Aircraft detection and type identification in optical remote sensing imagery are critical for civilian and military applications, including air traffic control and strategic surveillance. However, existing methods ignore the unique cross-shaped geometric structure and low spatial occupancy of aircraft, leading to inaccurate localization and category confusion. In response, this paper proposes a novel anchor-free detection network that leverages point set representation, integrating the progressive class-aware dual branches (PCA-DB) and instance-guided enhancement module (IGEM). Specifically, considering the underlying structure of aircraft, PCA-DB consists of the coarse foreground instance branch and the refined cross-shaped branch to facilitate high-quality point set generation. Through multi-task learning, the auxiliary branches implicitly inject geometric priors into shared features, effectively suppressing background interference. Subsequently, IGEM introduces the interactive attention mechanism to adaptively fuse the instance-level information in the auxiliary branch with features in the main branches, explicitly enhancing the discriminative features of aircraft. Extensive experiments validate the superior performance of the proposed method on several aircraft datasets, including MAR20, FAIR1M-Plane, and CORS-ADD. There are 5.42%, 4.28%, and 1.37% improvements in mAP in our method compared to the baseline network.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"164 ","pages":"Article 111503"},"PeriodicalIF":7.5,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143620242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信