FLDet：更快更轻的空中目标探测器

IF 8.3 1区工程技术 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Circuits and Systems for Video Technology Pub Date : 2024-12-13 DOI:10.1109/TCSVT.2024.3516760

Shuyang Wang;Kang Liu;Ju Huang;Xuelong Li

{"title":"FLDet：更快更轻的空中目标探测器","authors":"Shuyang Wang;Kang Liu;Ju Huang;Xuelong Li","doi":"10.1109/TCSVT.2024.3516760","DOIUrl":null,"url":null,"abstract":"In the rapidly evolving field of unmanned aerial vehicles (UAVs), real-time object detection is crucial for enhancing UAV intelligence. However, existing research often prioritizes complex networks to boost performance, neglecting the inherent computational resource constraints of UAVs. This paper presents FLDet, a family of faster and lighter detectors specifically designed for UAVs. By revisiting the architecture of modern lightweight detectors from a top-down perspective, FLDet offers a novel and comprehensive redesign of the head, neck, and backbone components. Firstly, we propose a Scale Sparse Head (SSH) that utilizes only two heads to detect objects of varying sizes, leveraging scale sparse feature pyramids to balance performance and efficiency. This design provides heuristic guidance for detector architecture development, offering a new paradigm for detector development. Secondly, a Partial Interaction Neck (PIN) is introduced to facilitate partial interaction between different feature scales, thereby reducing computational costs while effectively integrating multi-scale information. Thirdly, inspired by the primate visual pathway, a Stage-Wise Heterogeneous Network (SHN) is presented, employing heterogeneous blocks to capture both local details and contextual information. Finally, we develop a training strategy called Decay Data Augmentation (DDA) to enhance the detector’s generalization capability, leveraging diverse representations generated by strong data augmentation techniques. Experimental results on two challenging UAV-view detection benchmarks, VisDrone2019 and UAVDT, demonstrate that FLDet achieves a state-of-the-art balance among accuracy, latency, and parameter efficiency. In real scenarios tests, the fastest variant, FLDet-N, achieves real-time performance exceeding 52 FPS on an NVIDIA Jetson Xavier NX with only 1.2M parameters. The source code is available at <uri>https://github.com/wsy-yjys/FLDet</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4450-4463"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FLDet: Faster and Lighter Aerial Object Detector\",\"authors\":\"Shuyang Wang;Kang Liu;Ju Huang;Xuelong Li\",\"doi\":\"10.1109/TCSVT.2024.3516760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the rapidly evolving field of unmanned aerial vehicles (UAVs), real-time object detection is crucial for enhancing UAV intelligence. However, existing research often prioritizes complex networks to boost performance, neglecting the inherent computational resource constraints of UAVs. This paper presents FLDet, a family of faster and lighter detectors specifically designed for UAVs. By revisiting the architecture of modern lightweight detectors from a top-down perspective, FLDet offers a novel and comprehensive redesign of the head, neck, and backbone components. Firstly, we propose a Scale Sparse Head (SSH) that utilizes only two heads to detect objects of varying sizes, leveraging scale sparse feature pyramids to balance performance and efficiency. This design provides heuristic guidance for detector architecture development, offering a new paradigm for detector development. Secondly, a Partial Interaction Neck (PIN) is introduced to facilitate partial interaction between different feature scales, thereby reducing computational costs while effectively integrating multi-scale information. Thirdly, inspired by the primate visual pathway, a Stage-Wise Heterogeneous Network (SHN) is presented, employing heterogeneous blocks to capture both local details and contextual information. Finally, we develop a training strategy called Decay Data Augmentation (DDA) to enhance the detector’s generalization capability, leveraging diverse representations generated by strong data augmentation techniques. Experimental results on two challenging UAV-view detection benchmarks, VisDrone2019 and UAVDT, demonstrate that FLDet achieves a state-of-the-art balance among accuracy, latency, and parameter efficiency. In real scenarios tests, the fastest variant, FLDet-N, achieves real-time performance exceeding 52 FPS on an NVIDIA Jetson Xavier NX with only 1.2M parameters. The source code is available at <uri>https://github.com/wsy-yjys/FLDet</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4450-4463\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10798479/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10798479/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

在快速发展的无人机领域，实时目标检测是提高无人机智能的关键。然而，现有的研究往往优先考虑复杂网络来提高无人机的性能，而忽略了无人机固有的计算资源约束。本文介绍了FLDet，一种专为无人机设计的更快、更轻的探测器系列。通过从自上而下的角度重新审视现代轻型探测器的架构，FLDet对头部、颈部和骨干组件进行了新颖而全面的重新设计。首先，我们提出了一种规模稀疏头（SSH），它只使用两个头来检测不同大小的对象，利用规模稀疏特征金字塔来平衡性能和效率。该设计为探测器体系结构的开发提供了启发式指导，为探测器的开发提供了一种新的范式。其次，引入部分交互颈（Partial Interaction Neck， PIN），促进不同特征尺度之间的部分交互，从而在有效整合多尺度信息的同时降低计算成本；第三，受灵长类动物视觉通路的启发，提出了一种分阶段异构网络（SHN），利用异构块捕获局部细节和上下文信息。最后，我们开发了一种称为衰减数据增强（DDA）的训练策略来增强检测器的泛化能力，利用强数据增强技术生成的各种表示。在两个具有挑战性的无人机视图检测基准（VisDrone2019和UAVDT）上的实验结果表明，FLDet在精度、延迟和参数效率之间实现了最先进的平衡。在真实场景测试中，最快的版本FLDet-N在NVIDIA Jetson Xavier NX上实现了超过52 FPS的实时性能，只有120万个参数。源代码可从https://github.com/wsy-yjys/FLDet获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

FLDet: Faster and Lighter Aerial Object Detector

In the rapidly evolving field of unmanned aerial vehicles (UAVs), real-time object detection is crucial for enhancing UAV intelligence. However, existing research often prioritizes complex networks to boost performance, neglecting the inherent computational resource constraints of UAVs. This paper presents FLDet, a family of faster and lighter detectors specifically designed for UAVs. By revisiting the architecture of modern lightweight detectors from a top-down perspective, FLDet offers a novel and comprehensive redesign of the head, neck, and backbone components. Firstly, we propose a Scale Sparse Head (SSH) that utilizes only two heads to detect objects of varying sizes, leveraging scale sparse feature pyramids to balance performance and efficiency. This design provides heuristic guidance for detector architecture development, offering a new paradigm for detector development. Secondly, a Partial Interaction Neck (PIN) is introduced to facilitate partial interaction between different feature scales, thereby reducing computational costs while effectively integrating multi-scale information. Thirdly, inspired by the primate visual pathway, a Stage-Wise Heterogeneous Network (SHN) is presented, employing heterogeneous blocks to capture both local details and contextual information. Finally, we develop a training strategy called Decay Data Augmentation (DDA) to enhance the detector’s generalization capability, leveraging diverse representations generated by strong data augmentation techniques. Experimental results on two challenging UAV-view detection benchmarks, VisDrone2019 and UAVDT, demonstrate that FLDet achieves a state-of-the-art balance among accuracy, latency, and parameter efficiency. In real scenarios tests, the fastest variant, FLDet-N, achieves real-time performance exceeding 52 FPS on an NVIDIA Jetson Xavier NX with only 1.2M parameters. The source code is available at https://github.com/wsy-yjys/FLDet.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Circuits and Systems for Video Technology 工程技术-工程：电子与电气

CiteScore

13.80

自引率

27.40%

发文量

660

审稿时长

5 months

期刊介绍： The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.