Na Zhao;Peisheng Qian;Fang Wu;Xun Xu;Xulei Yang;Gim Hee Lee
{"title":"SDCoT++:改进的静态-动态协同教学——增量三维物体检测","authors":"Na Zhao;Peisheng Qian;Fang Wu;Xun Xu;Xulei Yang;Gim Hee Lee","doi":"10.1109/TIP.2024.3518774","DOIUrl":null,"url":null,"abstract":"Deep learning approaches have demonstrated high effectiveness in 3D object detection tasks. However, they often suffer from a notable drop in performance on the previously trained classes when learning new classes incrementally without revisiting the old data. This is the “catastrophic forgetting” phenomenon which impedes 3D object detection in real-world scenarios, where intelligent machines must continuously learn to detect previously unseen categories. Furthermore, frequent co-occurrences of old and new classes in scenes exacerbate catastrophic forgetting and cause model confusion. To address these challenges, we propose a novel static-dynamic co-teaching approach. Our framework involves a student model and two teacher models: a static teacher with fixed weights which imparts preserved old knowledge to the student, and a dynamic teacher with continuously updated weights which transfers underlying knowledge from new data to the student. To mitigate the issue of co-occurrence, we generate pseudo labels for base (i.e. old) classes from both static and dynamic sources during incremental learning. Additionally, to mitigate the negative impact of varying occurrence frequencies of classes on fixed thresholding during the selection of pseudo labels, we calibrate the probabilities of base classes to attain more balanced class probabilities. Moreover, our static-dynamic co-teaching framework is backbone-agnostic, making it compatible with different detection architectures. We demonstrate its backbone-agnostic nature by adapting three representative 3D object detectors: VoteNet, 3DETR and CAGroup3D. Extensive experiments showcase the superior performance of our proposed method compared to baseline approaches across indoor and outdoor benchmark datasets and applicability with different backbone models.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4188-4202"},"PeriodicalIF":13.7000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SDCoT++: Improved Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection\",\"authors\":\"Na Zhao;Peisheng Qian;Fang Wu;Xun Xu;Xulei Yang;Gim Hee Lee\",\"doi\":\"10.1109/TIP.2024.3518774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning approaches have demonstrated high effectiveness in 3D object detection tasks. However, they often suffer from a notable drop in performance on the previously trained classes when learning new classes incrementally without revisiting the old data. This is the “catastrophic forgetting” phenomenon which impedes 3D object detection in real-world scenarios, where intelligent machines must continuously learn to detect previously unseen categories. Furthermore, frequent co-occurrences of old and new classes in scenes exacerbate catastrophic forgetting and cause model confusion. To address these challenges, we propose a novel static-dynamic co-teaching approach. Our framework involves a student model and two teacher models: a static teacher with fixed weights which imparts preserved old knowledge to the student, and a dynamic teacher with continuously updated weights which transfers underlying knowledge from new data to the student. To mitigate the issue of co-occurrence, we generate pseudo labels for base (i.e. old) classes from both static and dynamic sources during incremental learning. Additionally, to mitigate the negative impact of varying occurrence frequencies of classes on fixed thresholding during the selection of pseudo labels, we calibrate the probabilities of base classes to attain more balanced class probabilities. Moreover, our static-dynamic co-teaching framework is backbone-agnostic, making it compatible with different detection architectures. We demonstrate its backbone-agnostic nature by adapting three representative 3D object detectors: VoteNet, 3DETR and CAGroup3D. Extensive experiments showcase the superior performance of our proposed method compared to baseline approaches across indoor and outdoor benchmark datasets and applicability with different backbone models.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"4188-4202\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10819355/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10819355/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
SDCoT++: Improved Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection
Deep learning approaches have demonstrated high effectiveness in 3D object detection tasks. However, they often suffer from a notable drop in performance on the previously trained classes when learning new classes incrementally without revisiting the old data. This is the “catastrophic forgetting” phenomenon which impedes 3D object detection in real-world scenarios, where intelligent machines must continuously learn to detect previously unseen categories. Furthermore, frequent co-occurrences of old and new classes in scenes exacerbate catastrophic forgetting and cause model confusion. To address these challenges, we propose a novel static-dynamic co-teaching approach. Our framework involves a student model and two teacher models: a static teacher with fixed weights which imparts preserved old knowledge to the student, and a dynamic teacher with continuously updated weights which transfers underlying knowledge from new data to the student. To mitigate the issue of co-occurrence, we generate pseudo labels for base (i.e. old) classes from both static and dynamic sources during incremental learning. Additionally, to mitigate the negative impact of varying occurrence frequencies of classes on fixed thresholding during the selection of pseudo labels, we calibrate the probabilities of base classes to attain more balanced class probabilities. Moreover, our static-dynamic co-teaching framework is backbone-agnostic, making it compatible with different detection architectures. We demonstrate its backbone-agnostic nature by adapting three representative 3D object detectors: VoteNet, 3DETR and CAGroup3D. Extensive experiments showcase the superior performance of our proposed method compared to baseline approaches across indoor and outdoor benchmark datasets and applicability with different backbone models.