{"title":"CLEAN:分类知识驱动的压缩框架,用于高效的3D目标检测。","authors":"Haonan Zhang,Longjun Liu,Fei Hui,Bo Zhang,Hengmin Zhang,Zhiyuan Zha","doi":"10.1109/tpami.2025.3582706","DOIUrl":null,"url":null,"abstract":"Deep neural networks (DNNs) are potent in LiDAR-based 3D object detection (LiDAR-3DOD), yet their deployment remains daunting due to their cumbersome parameters and computations. Knowledge distillation (KD) is promising for compressing DNNs in LiDAR-3DOD. However, most existing KD methods transfer inadequate knowledge between homogeneous detectors, and do not thoroughly explore optimal student architectures, resulting in insufficient gains for compact student detectors. To this end, we propose a category knowledge-driven compression framework to achieve efficient LiDAR-based 3D detectors. Firstly, we distill knowledge from two-stage teacher detectors to one-stage student detectors, overcoming the limitations of homogeneous pairs. To conduct KD in these heterogeneous pairs, we explore the gap between heterogeneous detectors, and introduce category knowledge-driven KD (CaKD), which includes both student-oriented distillation and two-stage-oriented label assignment distillation. Secondly, to search for the optimal architecture of compact student detectors, we introduce a masked category knowledge-driven structured pruning scheme. This scheme evaluates filter importance by analyzing the changes in category predictions related to foreground regions before and after filter removal, and prunes the less important filters accordingly. Finally, we propose a modified IoU-aware redundancy elimination module to remove redundant false positive samples, thereby further improving the accuracy of detectors. Experiments on various point cloud datasets demonstrate that our method delivers impressive results. For example, on KITTI, several compressed one-stage detectors outperform two-stage detectors in both efficiency and accuracy. Besides, on WOD-mini, our framework reduces the memory footprint of CenterPoint by 5.2× and improves the L2 mAPH by 0.55$\\%$.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"54 90 1","pages":""},"PeriodicalIF":20.8000,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"CLEAN: Category Knowledge-Driven Compression Framework for Efficient 3D Object Detection.\",\"authors\":\"Haonan Zhang,Longjun Liu,Fei Hui,Bo Zhang,Hengmin Zhang,Zhiyuan Zha\",\"doi\":\"10.1109/tpami.2025.3582706\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep neural networks (DNNs) are potent in LiDAR-based 3D object detection (LiDAR-3DOD), yet their deployment remains daunting due to their cumbersome parameters and computations. Knowledge distillation (KD) is promising for compressing DNNs in LiDAR-3DOD. However, most existing KD methods transfer inadequate knowledge between homogeneous detectors, and do not thoroughly explore optimal student architectures, resulting in insufficient gains for compact student detectors. To this end, we propose a category knowledge-driven compression framework to achieve efficient LiDAR-based 3D detectors. Firstly, we distill knowledge from two-stage teacher detectors to one-stage student detectors, overcoming the limitations of homogeneous pairs. To conduct KD in these heterogeneous pairs, we explore the gap between heterogeneous detectors, and introduce category knowledge-driven KD (CaKD), which includes both student-oriented distillation and two-stage-oriented label assignment distillation. Secondly, to search for the optimal architecture of compact student detectors, we introduce a masked category knowledge-driven structured pruning scheme. This scheme evaluates filter importance by analyzing the changes in category predictions related to foreground regions before and after filter removal, and prunes the less important filters accordingly. Finally, we propose a modified IoU-aware redundancy elimination module to remove redundant false positive samples, thereby further improving the accuracy of detectors. Experiments on various point cloud datasets demonstrate that our method delivers impressive results. For example, on KITTI, several compressed one-stage detectors outperform two-stage detectors in both efficiency and accuracy. Besides, on WOD-mini, our framework reduces the memory footprint of CenterPoint by 5.2× and improves the L2 mAPH by 0.55$\\\\%$.\",\"PeriodicalId\":13426,\"journal\":{\"name\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"volume\":\"54 90 1\",\"pages\":\"\"},\"PeriodicalIF\":20.8000,\"publicationDate\":\"2025-06-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Pattern Analysis and Machine Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1109/tpami.2025.3582706\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3582706","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
CLEAN: Category Knowledge-Driven Compression Framework for Efficient 3D Object Detection.
Deep neural networks (DNNs) are potent in LiDAR-based 3D object detection (LiDAR-3DOD), yet their deployment remains daunting due to their cumbersome parameters and computations. Knowledge distillation (KD) is promising for compressing DNNs in LiDAR-3DOD. However, most existing KD methods transfer inadequate knowledge between homogeneous detectors, and do not thoroughly explore optimal student architectures, resulting in insufficient gains for compact student detectors. To this end, we propose a category knowledge-driven compression framework to achieve efficient LiDAR-based 3D detectors. Firstly, we distill knowledge from two-stage teacher detectors to one-stage student detectors, overcoming the limitations of homogeneous pairs. To conduct KD in these heterogeneous pairs, we explore the gap between heterogeneous detectors, and introduce category knowledge-driven KD (CaKD), which includes both student-oriented distillation and two-stage-oriented label assignment distillation. Secondly, to search for the optimal architecture of compact student detectors, we introduce a masked category knowledge-driven structured pruning scheme. This scheme evaluates filter importance by analyzing the changes in category predictions related to foreground regions before and after filter removal, and prunes the less important filters accordingly. Finally, we propose a modified IoU-aware redundancy elimination module to remove redundant false positive samples, thereby further improving the accuracy of detectors. Experiments on various point cloud datasets demonstrate that our method delivers impressive results. For example, on KITTI, several compressed one-stage detectors outperform two-stage detectors in both efficiency and accuracy. Besides, on WOD-mini, our framework reduces the memory footprint of CenterPoint by 5.2× and improves the L2 mAPH by 0.55$\%$.
期刊介绍:
The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.