{"title":"基于自适应交叉熵损失函数的空间混合注意网络","authors":"Nanhua Chen;Dongshuo Zhang;Kai Jiang;Meng Yu;Yeqing Zhu;Tai-Shan Lou;Liangyu Zhao","doi":"10.1109/TCSVT.2025.3560637","DOIUrl":null,"url":null,"abstract":"Cross-view geo-localization provides an offline visual positioning strategy for unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, it still faces the following challenges, leading to suboptimal localization performance: 1) Existing methods primarily focus on extracting global features or local features by partitioning feature maps, neglecting the exploration of spatial information, which is essential for extracting consistent feature representations and aligning images of identical targets across different views. 2) Cross-view geo-localization encounters the challenge of data imbalance between UAV and satellite images. To address these challenges, the Spatial Hybrid Attention Network with Adaptive Cross-Entropy Loss Function (SHAA) is proposed. To tackle the first issue, the Spatial Hybrid Attention (SHA) method employs a Spatial Shift-MLP (SSM) to focus on the spatial geometric correspondences in feature maps across different views, extracting both global features and fine-grained features. Additionally, the SHA method utilizes a Hybrid Attention (HA) mechanism to enhance feature extraction diversity and robustness by capturing interactions between spatial and channel dimensions, thereby extracting consistent cross-view features and aligning images. For the second challenge, the Adaptive Cross-Entropy (ACE) loss function incorporates adaptive weights to emphasize hard samples, alleviating data imbalance issues and improving training effectiveness. Extensive experiments on widely recognized benchmarks, including University-1652, SUES-200, and DenseUAV, demonstrate that SHAA achieves state-of-the-art performance, outperforming existing methods by over 3.92%. Code will be released at: <uri>https://github.com/chennanhua001/SHAA</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9398-9413"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SHAA: Spatial Hybrid Attention Network With Adaptive Cross-Entropy Loss Function for UAV-View Geo-Localization\",\"authors\":\"Nanhua Chen;Dongshuo Zhang;Kai Jiang;Meng Yu;Yeqing Zhu;Tai-Shan Lou;Liangyu Zhao\",\"doi\":\"10.1109/TCSVT.2025.3560637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-view geo-localization provides an offline visual positioning strategy for unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, it still faces the following challenges, leading to suboptimal localization performance: 1) Existing methods primarily focus on extracting global features or local features by partitioning feature maps, neglecting the exploration of spatial information, which is essential for extracting consistent feature representations and aligning images of identical targets across different views. 2) Cross-view geo-localization encounters the challenge of data imbalance between UAV and satellite images. To address these challenges, the Spatial Hybrid Attention Network with Adaptive Cross-Entropy Loss Function (SHAA) is proposed. To tackle the first issue, the Spatial Hybrid Attention (SHA) method employs a Spatial Shift-MLP (SSM) to focus on the spatial geometric correspondences in feature maps across different views, extracting both global features and fine-grained features. Additionally, the SHA method utilizes a Hybrid Attention (HA) mechanism to enhance feature extraction diversity and robustness by capturing interactions between spatial and channel dimensions, thereby extracting consistent cross-view features and aligning images. For the second challenge, the Adaptive Cross-Entropy (ACE) loss function incorporates adaptive weights to emphasize hard samples, alleviating data imbalance issues and improving training effectiveness. Extensive experiments on widely recognized benchmarks, including University-1652, SUES-200, and DenseUAV, demonstrate that SHAA achieves state-of-the-art performance, outperforming existing methods by over 3.92%. Code will be released at: <uri>https://github.com/chennanhua001/SHAA</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9398-9413\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10965775/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10965775/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
SHAA: Spatial Hybrid Attention Network With Adaptive Cross-Entropy Loss Function for UAV-View Geo-Localization
Cross-view geo-localization provides an offline visual positioning strategy for unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, it still faces the following challenges, leading to suboptimal localization performance: 1) Existing methods primarily focus on extracting global features or local features by partitioning feature maps, neglecting the exploration of spatial information, which is essential for extracting consistent feature representations and aligning images of identical targets across different views. 2) Cross-view geo-localization encounters the challenge of data imbalance between UAV and satellite images. To address these challenges, the Spatial Hybrid Attention Network with Adaptive Cross-Entropy Loss Function (SHAA) is proposed. To tackle the first issue, the Spatial Hybrid Attention (SHA) method employs a Spatial Shift-MLP (SSM) to focus on the spatial geometric correspondences in feature maps across different views, extracting both global features and fine-grained features. Additionally, the SHA method utilizes a Hybrid Attention (HA) mechanism to enhance feature extraction diversity and robustness by capturing interactions between spatial and channel dimensions, thereby extracting consistent cross-view features and aligning images. For the second challenge, the Adaptive Cross-Entropy (ACE) loss function incorporates adaptive weights to emphasize hard samples, alleviating data imbalance issues and improving training effectiveness. Extensive experiments on widely recognized benchmarks, including University-1652, SUES-200, and DenseUAV, demonstrate that SHAA achieves state-of-the-art performance, outperforming existing methods by over 3.92%. Code will be released at: https://github.com/chennanhua001/SHAA.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.