{"title":"MLK-TR: a Multi-branch Large Kernel TRansformer for UAV-based images","authors":"Xun Li, Yuzhen Zhao, Yang Zhao, Zhun Guo, Jianjing Gao, Baoxi Yuan","doi":"10.1007/s40747-025-01901-0","DOIUrl":null,"url":null,"abstract":"<p>Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation and blurred edges, existing methods struggle to maintain high detection accuracy while ensuring efficient inference speed. To address this, this paper proposes a Multi-branch Large-Kernel TRansformer network (MLK-TR) for small target detection in UAV scenarios. Compared with existing detectors, MLK-TR improves detection performance through the following innovations. First, the Sparse Large-Kernel Attention Mechanism (SLK-Atten) proposed selects key information in the image by sparsifying feature representations. Next, the C3PA2 module enhances the feature extraction capability of the detector, thus improving the detector’s focus on foreground targets. In addition, the Frequent Interaction Feature Fusion Network (FIFFN) facilitates feature interaction between different levels, enhancing the detector’s adaptability to different scales. Finally, super high-resolution prediction feature maps are introduced to enhance edge details, thereby improving the detector’s sensitivity to small targets. Notably, the proposed modules can be easily integrated into the YOLO series framework. Compared to the original YOLO11n, MLK-TR achieves a 9% improvement in mAP50 on the publicly available VisDrone dataset, a 1.9% improvement in mAP50 on the UAVDT dataset, and a 3.6% improvement in mAP50 on the PVD dataset. These results confirm the effectiveness of MLK-TR in addressing the complexities of UAV object detection.</p>","PeriodicalId":10524,"journal":{"name":"Complex & Intelligent Systems","volume":"3 1","pages":""},"PeriodicalIF":5.0000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Complex & Intelligent Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s40747-025-01901-0","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Object detection from the perspective of unmanned aerial vehicles (UAV) is a technology that utilizes visual sensors mounted on UAV to automatically identify and locate ground targets. However, due to the small size of targets captured by UAV, along with challenges such as scale variation and blurred edges, existing methods struggle to maintain high detection accuracy while ensuring efficient inference speed. To address this, this paper proposes a Multi-branch Large-Kernel TRansformer network (MLK-TR) for small target detection in UAV scenarios. Compared with existing detectors, MLK-TR improves detection performance through the following innovations. First, the Sparse Large-Kernel Attention Mechanism (SLK-Atten) proposed selects key information in the image by sparsifying feature representations. Next, the C3PA2 module enhances the feature extraction capability of the detector, thus improving the detector’s focus on foreground targets. In addition, the Frequent Interaction Feature Fusion Network (FIFFN) facilitates feature interaction between different levels, enhancing the detector’s adaptability to different scales. Finally, super high-resolution prediction feature maps are introduced to enhance edge details, thereby improving the detector’s sensitivity to small targets. Notably, the proposed modules can be easily integrated into the YOLO series framework. Compared to the original YOLO11n, MLK-TR achieves a 9% improvement in mAP50 on the publicly available VisDrone dataset, a 1.9% improvement in mAP50 on the UAVDT dataset, and a 3.6% improvement in mAP50 on the PVD dataset. These results confirm the effectiveness of MLK-TR in addressing the complexities of UAV object detection.
期刊介绍:
Complex & Intelligent Systems aims to provide a forum for presenting and discussing novel approaches, tools and techniques meant for attaining a cross-fertilization between the broad fields of complex systems, computational simulation, and intelligent analytics and visualization. The transdisciplinary research that the journal focuses on will expand the boundaries of our understanding by investigating the principles and processes that underlie many of the most profound problems facing society today.