SDCoT++：改进的静态-动态协同教学——增量三维物体检测

IF 13.7

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society Pub Date : 2024-12-31 DOI:10.1109/TIP.2024.3518774

Na Zhao;Peisheng Qian;Fang Wu;Xun Xu;Xulei Yang;Gim Hee Lee

{"title":"SDCoT++：改进的静态-动态协同教学——增量三维物体检测","authors":"Na Zhao;Peisheng Qian;Fang Wu;Xun Xu;Xulei Yang;Gim Hee Lee","doi":"10.1109/TIP.2024.3518774","DOIUrl":null,"url":null,"abstract":"Deep learning approaches have demonstrated high effectiveness in 3D object detection tasks. However, they often suffer from a notable drop in performance on the previously trained classes when learning new classes incrementally without revisiting the old data. This is the “catastrophic forgetting” phenomenon which impedes 3D object detection in real-world scenarios, where intelligent machines must continuously learn to detect previously unseen categories. Furthermore, frequent co-occurrences of old and new classes in scenes exacerbate catastrophic forgetting and cause model confusion. To address these challenges, we propose a novel static-dynamic co-teaching approach. Our framework involves a student model and two teacher models: a static teacher with fixed weights which imparts preserved old knowledge to the student, and a dynamic teacher with continuously updated weights which transfers underlying knowledge from new data to the student. To mitigate the issue of co-occurrence, we generate pseudo labels for base (i.e. old) classes from both static and dynamic sources during incremental learning. Additionally, to mitigate the negative impact of varying occurrence frequencies of classes on fixed thresholding during the selection of pseudo labels, we calibrate the probabilities of base classes to attain more balanced class probabilities. Moreover, our static-dynamic co-teaching framework is backbone-agnostic, making it compatible with different detection architectures. We demonstrate its backbone-agnostic nature by adapting three representative 3D object detectors: VoteNet, 3DETR and CAGroup3D. Extensive experiments showcase the superior performance of our proposed method compared to baseline approaches across indoor and outdoor benchmark datasets and applicability with different backbone models.","PeriodicalId":94032,"journal":{"name":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","volume":"34 ","pages":"4188-4202"},"PeriodicalIF":13.7000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"SDCoT++: Improved Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection\",\"authors\":\"Na Zhao;Peisheng Qian;Fang Wu;Xun Xu;Xulei Yang;Gim Hee Lee\",\"doi\":\"10.1109/TIP.2024.3518774\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning approaches have demonstrated high effectiveness in 3D object detection tasks. However, they often suffer from a notable drop in performance on the previously trained classes when learning new classes incrementally without revisiting the old data. This is the “catastrophic forgetting” phenomenon which impedes 3D object detection in real-world scenarios, where intelligent machines must continuously learn to detect previously unseen categories. Furthermore, frequent co-occurrences of old and new classes in scenes exacerbate catastrophic forgetting and cause model confusion. To address these challenges, we propose a novel static-dynamic co-teaching approach. Our framework involves a student model and two teacher models: a static teacher with fixed weights which imparts preserved old knowledge to the student, and a dynamic teacher with continuously updated weights which transfers underlying knowledge from new data to the student. To mitigate the issue of co-occurrence, we generate pseudo labels for base (i.e. old) classes from both static and dynamic sources during incremental learning. Additionally, to mitigate the negative impact of varying occurrence frequencies of classes on fixed thresholding during the selection of pseudo labels, we calibrate the probabilities of base classes to attain more balanced class probabilities. Moreover, our static-dynamic co-teaching framework is backbone-agnostic, making it compatible with different detection architectures. We demonstrate its backbone-agnostic nature by adapting three representative 3D object detectors: VoteNet, 3DETR and CAGroup3D. Extensive experiments showcase the superior performance of our proposed method compared to baseline approaches across indoor and outdoor benchmark datasets and applicability with different backbone models.\",\"PeriodicalId\":94032,\"journal\":{\"name\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"volume\":\"34 \",\"pages\":\"4188-4202\"},\"PeriodicalIF\":13.7000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10819355/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on image processing : a publication of the IEEE Signal Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10819355/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度学习方法在3D目标检测任务中已经证明了很高的有效性。然而，在不重新访问旧数据的情况下增量地学习新类时，它们在先前训练过的类上的性能通常会明显下降。这是一种“灾难性遗忘”现象，它阻碍了现实场景中的3D物体检测，在现实场景中，智能机器必须不断学习检测以前未见过的类别。此外，场景中新旧类的频繁共存加剧了灾难性的遗忘，并导致模型混乱。为了解决这些挑战，我们提出了一种新的静态动态联合教学方法。我们的框架包括一个学生模型和两个教师模型：一个具有固定权重的静态教师模型，它将保留的旧知识传授给学生；一个具有不断更新权重的动态教师模型，它将新数据中的基础知识传递给学生。为了缓解共存的问题，我们在增量学习期间从静态和动态源为基类（即旧类）生成伪标签。此外，为了减轻在选择伪标签期间类的不同出现频率对固定阈值的负面影响，我们校准基类的概率以获得更平衡的类概率。此外，我们的静态-动态协同教学框架是骨干不可知论的，使其与不同的检测体系结构兼容。我们通过采用三个代表性的3D对象检测器：VoteNet， 3DETR和CAGroup3D来证明其骨干不可知论性质。大量的实验表明，与室内和室外基准数据集的基线方法相比，我们提出的方法具有优越的性能，并且适用于不同的骨干模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SDCoT++: Improved Static-Dynamic Co-Teaching for Class-Incremental 3D Object Detection

Deep learning approaches have demonstrated high effectiveness in 3D object detection tasks. However, they often suffer from a notable drop in performance on the previously trained classes when learning new classes incrementally without revisiting the old data. This is the “catastrophic forgetting” phenomenon which impedes 3D object detection in real-world scenarios, where intelligent machines must continuously learn to detect previously unseen categories. Furthermore, frequent co-occurrences of old and new classes in scenes exacerbate catastrophic forgetting and cause model confusion. To address these challenges, we propose a novel static-dynamic co-teaching approach. Our framework involves a student model and two teacher models: a static teacher with fixed weights which imparts preserved old knowledge to the student, and a dynamic teacher with continuously updated weights which transfers underlying knowledge from new data to the student. To mitigate the issue of co-occurrence, we generate pseudo labels for base (i.e. old) classes from both static and dynamic sources during incremental learning. Additionally, to mitigate the negative impact of varying occurrence frequencies of classes on fixed thresholding during the selection of pseudo labels, we calibrate the probabilities of base classes to attain more balanced class probabilities. Moreover, our static-dynamic co-teaching framework is backbone-agnostic, making it compatible with different detection architectures. We demonstrate its backbone-agnostic nature by adapting three representative 3D object detectors: VoteNet, 3DETR and CAGroup3D. Extensive experiments showcase the superior performance of our proposed method compared to baseline approaches across indoor and outdoor benchmark datasets and applicability with different backbone models.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society

自引率

0.00%

发文量