用于配电网安全风险检测的高效多径视觉变压器

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2024-11-22 DOI:10.1016/j.neucom.2024.128967

Pan Li, Xiaofang Yuan, Haozhi Xu, Jinlei Wang, Yaonan Wang

{"title":"用于配电网安全风险检测的高效多径视觉变压器","authors":"Pan Li, Xiaofang Yuan, Haozhi Xu, Jinlei Wang, Yaonan Wang","doi":"10.1016/j.neucom.2024.128967","DOIUrl":null,"url":null,"abstract":"<div><div>To maintain the safe operation of power distribution network (PDN) equipment, it is important to accurately and promptly identify security risks. However, conventional drone-based object detection methods face challenges due to noise and similarity features in risk targets, as well as limited computing resources of unmanned aerial vehicles (UAVs). To address these challenges, an efficient embedding-based multi-path fusion architecture is proposed. This architecture uses a re-parameterized depthwise block to embed local context information at different scales, enhancing the extraction of tiny features while preserving inference speed. Additionally, a coordinated self-attention module is proposed to reduce computational complexity while maintaining the performance of global information. By fusing fine and coarse feature representations without requiring a lot of computation, this module efficiently learns from both local and global features from images. The goal is to create an efficient multi-path vision transformer (EMPViT) architecture that achieves a balance between accuracy and efficiency. The proposed EMPViT has been evaluated on two different drone image dataset, demonstrating better performance compared to other architectures. Specifically, the EMPViT-S improves the detection mAP by 1.2%, and the inference speed is improved to 1.24 times on average on Drone-PDN dataset. It has achieved the same performance improvement on VisDrone-DET2019 dataset, gaining detection performance by 1.3% and 1.2 times acceleration on average.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"617 ","pages":"Article 128967"},"PeriodicalIF":5.5000,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"EMPViT: Efficient multi-path vision transformer for security risks detection in power distribution network\",\"authors\":\"Pan Li, Xiaofang Yuan, Haozhi Xu, Jinlei Wang, Yaonan Wang\",\"doi\":\"10.1016/j.neucom.2024.128967\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>To maintain the safe operation of power distribution network (PDN) equipment, it is important to accurately and promptly identify security risks. However, conventional drone-based object detection methods face challenges due to noise and similarity features in risk targets, as well as limited computing resources of unmanned aerial vehicles (UAVs). To address these challenges, an efficient embedding-based multi-path fusion architecture is proposed. This architecture uses a re-parameterized depthwise block to embed local context information at different scales, enhancing the extraction of tiny features while preserving inference speed. Additionally, a coordinated self-attention module is proposed to reduce computational complexity while maintaining the performance of global information. By fusing fine and coarse feature representations without requiring a lot of computation, this module efficiently learns from both local and global features from images. The goal is to create an efficient multi-path vision transformer (EMPViT) architecture that achieves a balance between accuracy and efficiency. The proposed EMPViT has been evaluated on two different drone image dataset, demonstrating better performance compared to other architectures. Specifically, the EMPViT-S improves the detection mAP by 1.2%, and the inference speed is improved to 1.24 times on average on Drone-PDN dataset. It has achieved the same performance improvement on VisDrone-DET2019 dataset, gaining detection performance by 1.3% and 1.2 times acceleration on average.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"617 \",\"pages\":\"Article 128967\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-11-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231224017387\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231224017387","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

为了维护PDN设备的安全运行，准确、及时地识别安全隐患是十分重要的。然而，传统的基于无人机的目标检测方法由于风险目标的噪声和相似性特征以及无人机计算资源有限而面临挑战。为了解决这些问题，提出了一种高效的基于嵌入的多路径融合体系结构。该体系结构使用重新参数化的深度块来嵌入不同尺度的局部上下文信息，在保持推理速度的同时增强了对微小特征的提取。此外，为了在保持全局信息性能的同时降低计算复杂度，提出了一种协调的自关注模块。通过在不需要大量计算的情况下融合精细和粗糙的特征表示，该模块可以有效地从图像中学习局部和全局特征。目标是创建一个高效的多路径视觉转换器（EMPViT）体系结构，在准确性和效率之间取得平衡。在两个不同的无人机图像数据集上对所提出的EMPViT进行了评估，与其他架构相比，显示出更好的性能。其中，EMPViT-S在无人机- pdn数据集上的检测mAP提高了1.2%，推理速度平均提高到1.24倍。它在VisDrone-DET2019数据集上取得了相同的性能提升，检测性能平均提高1.3%，加速度平均提高1.2倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

EMPViT: Efficient multi-path vision transformer for security risks detection in power distribution network

查看原文本刊更多论文

EMPViT: Efficient multi-path vision transformer for security risks detection in power distribution network

To maintain the safe operation of power distribution network (PDN) equipment, it is important to accurately and promptly identify security risks. However, conventional drone-based object detection methods face challenges due to noise and similarity features in risk targets, as well as limited computing resources of unmanned aerial vehicles (UAVs). To address these challenges, an efficient embedding-based multi-path fusion architecture is proposed. This architecture uses a re-parameterized depthwise block to embed local context information at different scales, enhancing the extraction of tiny features while preserving inference speed. Additionally, a coordinated self-attention module is proposed to reduce computational complexity while maintaining the performance of global information. By fusing fine and coarse feature representations without requiring a lot of computation, this module efficiently learns from both local and global features from images. The goal is to create an efficient multi-path vision transformer (EMPViT) architecture that achieves a balance between accuracy and efficiency. The proposed EMPViT has been evaluated on two different drone image dataset, demonstrating better performance compared to other architectures. Specifically, the EMPViT-S improves the detection mAP by 1.2%, and the inference speed is improved to 1.24 times on average on Drone-PDN dataset. It has achieved the same performance improvement on VisDrone-DET2019 dataset, gaining detection performance by 1.3% and 1.2 times acceleration on average.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.