{"title":"FLDet:更快更轻的空中目标探测器","authors":"Shuyang Wang;Kang Liu;Ju Huang;Xuelong Li","doi":"10.1109/TCSVT.2024.3516760","DOIUrl":null,"url":null,"abstract":"In the rapidly evolving field of unmanned aerial vehicles (UAVs), real-time object detection is crucial for enhancing UAV intelligence. However, existing research often prioritizes complex networks to boost performance, neglecting the inherent computational resource constraints of UAVs. This paper presents FLDet, a family of faster and lighter detectors specifically designed for UAVs. By revisiting the architecture of modern lightweight detectors from a top-down perspective, FLDet offers a novel and comprehensive redesign of the head, neck, and backbone components. Firstly, we propose a Scale Sparse Head (SSH) that utilizes only two heads to detect objects of varying sizes, leveraging scale sparse feature pyramids to balance performance and efficiency. This design provides heuristic guidance for detector architecture development, offering a new paradigm for detector development. Secondly, a Partial Interaction Neck (PIN) is introduced to facilitate partial interaction between different feature scales, thereby reducing computational costs while effectively integrating multi-scale information. Thirdly, inspired by the primate visual pathway, a Stage-Wise Heterogeneous Network (SHN) is presented, employing heterogeneous blocks to capture both local details and contextual information. Finally, we develop a training strategy called Decay Data Augmentation (DDA) to enhance the detector’s generalization capability, leveraging diverse representations generated by strong data augmentation techniques. Experimental results on two challenging UAV-view detection benchmarks, VisDrone2019 and UAVDT, demonstrate that FLDet achieves a state-of-the-art balance among accuracy, latency, and parameter efficiency. In real scenarios tests, the fastest variant, FLDet-N, achieves real-time performance exceeding 52 FPS on an NVIDIA Jetson Xavier NX with only 1.2M parameters. The source code is available at <uri>https://github.com/wsy-yjys/FLDet</uri>.","PeriodicalId":13082,"journal":{"name":"IEEE Transactions on Circuits and Systems for Video Technology","volume":"35 5","pages":"4450-4463"},"PeriodicalIF":8.3000,"publicationDate":"2024-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"FLDet: Faster and Lighter Aerial Object Detector\",\"authors\":\"Shuyang Wang;Kang Liu;Ju Huang;Xuelong Li\",\"doi\":\"10.1109/TCSVT.2024.3516760\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the rapidly evolving field of unmanned aerial vehicles (UAVs), real-time object detection is crucial for enhancing UAV intelligence. However, existing research often prioritizes complex networks to boost performance, neglecting the inherent computational resource constraints of UAVs. This paper presents FLDet, a family of faster and lighter detectors specifically designed for UAVs. By revisiting the architecture of modern lightweight detectors from a top-down perspective, FLDet offers a novel and comprehensive redesign of the head, neck, and backbone components. Firstly, we propose a Scale Sparse Head (SSH) that utilizes only two heads to detect objects of varying sizes, leveraging scale sparse feature pyramids to balance performance and efficiency. This design provides heuristic guidance for detector architecture development, offering a new paradigm for detector development. Secondly, a Partial Interaction Neck (PIN) is introduced to facilitate partial interaction between different feature scales, thereby reducing computational costs while effectively integrating multi-scale information. Thirdly, inspired by the primate visual pathway, a Stage-Wise Heterogeneous Network (SHN) is presented, employing heterogeneous blocks to capture both local details and contextual information. Finally, we develop a training strategy called Decay Data Augmentation (DDA) to enhance the detector’s generalization capability, leveraging diverse representations generated by strong data augmentation techniques. Experimental results on two challenging UAV-view detection benchmarks, VisDrone2019 and UAVDT, demonstrate that FLDet achieves a state-of-the-art balance among accuracy, latency, and parameter efficiency. In real scenarios tests, the fastest variant, FLDet-N, achieves real-time performance exceeding 52 FPS on an NVIDIA Jetson Xavier NX with only 1.2M parameters. The source code is available at <uri>https://github.com/wsy-yjys/FLDet</uri>.\",\"PeriodicalId\":13082,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"volume\":\"35 5\",\"pages\":\"4450-4463\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2024-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems for Video Technology\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10798479/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems for Video Technology","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10798479/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
In the rapidly evolving field of unmanned aerial vehicles (UAVs), real-time object detection is crucial for enhancing UAV intelligence. However, existing research often prioritizes complex networks to boost performance, neglecting the inherent computational resource constraints of UAVs. This paper presents FLDet, a family of faster and lighter detectors specifically designed for UAVs. By revisiting the architecture of modern lightweight detectors from a top-down perspective, FLDet offers a novel and comprehensive redesign of the head, neck, and backbone components. Firstly, we propose a Scale Sparse Head (SSH) that utilizes only two heads to detect objects of varying sizes, leveraging scale sparse feature pyramids to balance performance and efficiency. This design provides heuristic guidance for detector architecture development, offering a new paradigm for detector development. Secondly, a Partial Interaction Neck (PIN) is introduced to facilitate partial interaction between different feature scales, thereby reducing computational costs while effectively integrating multi-scale information. Thirdly, inspired by the primate visual pathway, a Stage-Wise Heterogeneous Network (SHN) is presented, employing heterogeneous blocks to capture both local details and contextual information. Finally, we develop a training strategy called Decay Data Augmentation (DDA) to enhance the detector’s generalization capability, leveraging diverse representations generated by strong data augmentation techniques. Experimental results on two challenging UAV-view detection benchmarks, VisDrone2019 and UAVDT, demonstrate that FLDet achieves a state-of-the-art balance among accuracy, latency, and parameter efficiency. In real scenarios tests, the fastest variant, FLDet-N, achieves real-time performance exceeding 52 FPS on an NVIDIA Jetson Xavier NX with only 1.2M parameters. The source code is available at https://github.com/wsy-yjys/FLDet.
期刊介绍:
The IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) is dedicated to covering all aspects of video technologies from a circuits and systems perspective. We encourage submissions of general, theoretical, and application-oriented papers related to image and video acquisition, representation, presentation, and display. Additionally, we welcome contributions in areas such as processing, filtering, and transforms; analysis and synthesis; learning and understanding; compression, transmission, communication, and networking; as well as storage, retrieval, indexing, and search. Furthermore, papers focusing on hardware and software design and implementation are highly valued. Join us in advancing the field of video technology through innovative research and insights.