基于自适应交叉熵损失函数的空间混合注意网络

IF 11.1 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2025-04-15 DOI:10.1109/TCSVT.2025.3560637

Nanhua Chen;Dongshuo Zhang;Kai Jiang;Meng Yu;Yeqing Zhu;Tai-Shan Lou;Liangyu Zhao

{"title":"基于自适应交叉熵损失函数的空间混合注意网络","authors":"Nanhua Chen;Dongshuo Zhang;Kai Jiang;Meng Yu;Yeqing Zhu;Tai-Shan Lou;Liangyu Zhao","doi":"10.1109/TCSVT.2025.3560637","DOIUrl":null,"url":null,"abstract":"Cross-view geo-localization provides an offline visual positioning strategy for unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, it still faces the following challenges, leading to suboptimal localization performance: 1) Existing methods primarily focus on extracting global features or local features by partitioning feature maps, neglecting the exploration of spatial information, which is essential for extracting consistent feature representations and aligning images of identical targets across different views. 2) Cross-view geo-localization encounters the challenge of data imbalance between UAV and satellite images. To address these challenges, the Spatial Hybrid Attention Network with Adaptive Cross-Entropy Loss Function (SHAA) is proposed. To tackle the first issue, the Spatial Hybrid Attention (SHA) method employs a Spatial Shift-MLP (SSM) to focus on the spatial geometric correspondences in feature maps across different views, extracting both global features and fine-grained features. Additionally, the SHA method utilizes a Hybrid Attention (HA) mechanism to enhance feature extraction diversity and robustness by capturing interactions between spatial and channel dimensions, thereby extracting consistent cross-view features and aligning images. For the second challenge, the Adaptive Cross-Entropy (ACE) loss function incorporates adaptive weights to emphasize hard samples, alleviating data imbalance issues and improving training effectiveness. Extensive experiments on widely recognized benchmarks, including University-1652, SUES-200, and DenseUAV, demonstrate that SHAA achieves state-of-the-art performance, outperforming existing methods by over 3.92%. Code will be released at: <uri>https://github.com/chennanhua001/SHAA</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 9","pages":"9398-9413"},"PeriodicalIF":11.1000,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SHAA: Spatial Hybrid Attention Network With Adaptive Cross-Entropy Loss Function for UAV-View Geo-Localization\",\"authors\":\"Nanhua Chen;Dongshuo Zhang;Kai Jiang;Meng Yu;Yeqing Zhu;Tai-Shan Lou;Liangyu Zhao\",\"doi\":\"10.1109/TCSVT.2025.3560637\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Cross-view geo-localization provides an offline visual positioning strategy for unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, it still faces the following challenges, leading to suboptimal localization performance: 1) Existing methods primarily focus on extracting global features or local features by partitioning feature maps, neglecting the exploration of spatial information, which is essential for extracting consistent feature representations and aligning images of identical targets across different views. 2) Cross-view geo-localization encounters the challenge of data imbalance between UAV and satellite images. To address these challenges, the Spatial Hybrid Attention Network with Adaptive Cross-Entropy Loss Function (SHAA) is proposed. To tackle the first issue, the Spatial Hybrid Attention (SHA) method employs a Spatial Shift-MLP (SSM) to focus on the spatial geometric correspondences in feature maps across different views, extracting both global features and fine-grained features. Additionally, the SHA method utilizes a Hybrid Attention (HA) mechanism to enhance feature extraction diversity and robustness by capturing interactions between spatial and channel dimensions, thereby extracting consistent cross-view features and aligning images. For the second challenge, the Adaptive Cross-Entropy (ACE) loss function incorporates adaptive weights to emphasize hard samples, alleviating data imbalance issues and improving training effectiveness. Extensive experiments on widely recognized benchmarks, including University-1652, SUES-200, and DenseUAV, demonstrate that SHAA achieves state-of-the-art performance, outperforming existing methods by over 3.92%. Code will be released at: <uri>https://github.com/chennanhua001/SHAA</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 9\",\"pages\":\"9398-9413\"},\"PeriodicalIF\":11.1000,\"publicationDate\":\"2025-04-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10965775/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10965775/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

交叉视角地理定位为全球导航卫星系统（GNSS）拒绝环境下的无人机（uav）提供了一种离线视觉定位策略。然而，该方法仍然面临以下挑战，导致定位性能不理想：1)现有方法主要侧重于通过划分特征映射提取全局特征或局部特征，忽略了空间信息的探索，而空间信息对于提取一致的特征表示和跨不同视图的相同目标图像对齐至关重要。2)交叉视点地理定位面临无人机与卫星图像数据不平衡的挑战。为了解决这些问题，提出了具有自适应交叉熵损失函数（SHAA）的空间混合注意网络。为了解决第一个问题，空间混合注意（SHA）方法采用空间移位- mlp （SSM）来关注不同视图特征映射中的空间几何对应关系，提取全局特征和细粒度特征。此外，SHA方法利用混合注意（HA）机制，通过捕获空间和通道维度之间的相互作用来增强特征提取的多样性和鲁棒性，从而提取一致的交叉视图特征并对齐图像。对于第二个挑战，自适应交叉熵（ACE）损失函数结合自适应权值来强调硬样本，缓解数据不平衡问题，提高训练效率。在广泛认可的基准测试（包括University-1652、sus -200和DenseUAV）上进行的大量实验表明，SHAA实现了最先进的性能，比现有方法高出3.92%以上。代码将在https://github.com/chennanhua001/SHAA上发布。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SHAA: Spatial Hybrid Attention Network With Adaptive Cross-Entropy Loss Function for UAV-View Geo-Localization

Cross-view geo-localization provides an offline visual positioning strategy for unmanned aerial vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, it still faces the following challenges, leading to suboptimal localization performance: 1) Existing methods primarily focus on extracting global features or local features by partitioning feature maps, neglecting the exploration of spatial information, which is essential for extracting consistent feature representations and aligning images of identical targets across different views. 2) Cross-view geo-localization encounters the challenge of data imbalance between UAV and satellite images. To address these challenges, the Spatial Hybrid Attention Network with Adaptive Cross-Entropy Loss Function (SHAA) is proposed. To tackle the first issue, the Spatial Hybrid Attention (SHA) method employs a Spatial Shift-MLP (SSM) to focus on the spatial geometric correspondences in feature maps across different views, extracting both global features and fine-grained features. Additionally, the SHA method utilizes a Hybrid Attention (HA) mechanism to enhance feature extraction diversity and robustness by capturing interactions between spatial and channel dimensions, thereby extracting consistent cross-view features and aligning images. For the second challenge, the Adaptive Cross-Entropy (ACE) loss function incorporates adaptive weights to emphasize hard samples, alleviating data imbalance issues and improving training effectiveness. Extensive experiments on widely recognized benchmarks, including University-1652, SUES-200, and DenseUAV, demonstrate that SHAA achieves state-of-the-art performance, outperforming existing methods by over 3.92%. Code will be released at: https://github.com/chennanhua001/SHAA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.