AdaCoRCE损失知识蒸馏：网络裂变与协同教学技术的新方法

IEEE transactions on artificial intelligence Pub Date : 2025-01-13 DOI:10.1109/TAI.2025.3527402

Shankey Garg;Pradeep Singh

{"title":"AdaCoRCE损失知识蒸馏：网络裂变与协同教学技术的新方法","authors":"Shankey Garg;Pradeep Singh","doi":"10.1109/TAI.2025.3527402","DOIUrl":null,"url":null,"abstract":"Deep models have been successful in almost every research field, and they are capable of handling complex problem statements. But most of the deep neural networks are huge in size with millions/billions of parameters requiring heavy resources and computations to be installed in edge devices. In this article, we present an efficient co-teaching strategy consisting of multiple small networks performing mutually at runtime to consistently improve the efficiency and generalization ability of neural networks. Unlike existing distillation mechanism, that utilizes large capacity pre-train teacher model to transfer knowledge to a smaller network unidirectionally, proposed framework treats all the networks as ‘teacher’ (student-sized) and co-teach them allowing them to compute concurrently and quickly with better generalizations. We have carefully divided the backbone network into small network using depth scaling with regularizations. Multiple small networks are used during the co-teaching process, and the proposed AdaCoRCE loss is used to make the network learn from each other. During training, these networks are provided with the two different views of same data to increase their diversity. Co-teaching scheme allows model to fetch stronger and unique representation of knowledge by using different data views and AdaCoRCE loss. This article provides a generalized framework that could be applied to various network structures (e.g., MobileNets, ResNet, MixNet, etc.) and it demonstrates efficient performance on variety of histology image datasets. In this article, we have used four different publicly available histology dataset on two types of diseases to evaluate the performance of proposed technique. Analysis on colorectal cancer and breast cancer histology images suggests that the proposed model enhances the overall performance of the model in terms of accuracy, GFLOPs and inference time. Further, the proposed framework is also analyzed using benchmark cifar-10 dataset and comparison of our result is done with several state-of-the-art results on mutual/collaborative learning. To the best of our knowledge, we analyzed that the proposed model outperformed these recent models in terms of accuracy, GFLOPs and inference time. Extensive result analysis on different histology benchmark datasets and benchmark cifar-10 dataset suggests that the proposed model is a generally applicable model that could be used for various computer vision-based tasks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 7","pages":"1776-1786"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AdaCoRCE Loss for Knowledge Distillation: A Novel Approach With Network Fission and Co-Teaching Technique\",\"authors\":\"Shankey Garg;Pradeep Singh\",\"doi\":\"10.1109/TAI.2025.3527402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep models have been successful in almost every research field, and they are capable of handling complex problem statements. But most of the deep neural networks are huge in size with millions/billions of parameters requiring heavy resources and computations to be installed in edge devices. In this article, we present an efficient co-teaching strategy consisting of multiple small networks performing mutually at runtime to consistently improve the efficiency and generalization ability of neural networks. Unlike existing distillation mechanism, that utilizes large capacity pre-train teacher model to transfer knowledge to a smaller network unidirectionally, proposed framework treats all the networks as ‘teacher’ (student-sized) and co-teach them allowing them to compute concurrently and quickly with better generalizations. We have carefully divided the backbone network into small network using depth scaling with regularizations. Multiple small networks are used during the co-teaching process, and the proposed AdaCoRCE loss is used to make the network learn from each other. During training, these networks are provided with the two different views of same data to increase their diversity. Co-teaching scheme allows model to fetch stronger and unique representation of knowledge by using different data views and AdaCoRCE loss. This article provides a generalized framework that could be applied to various network structures (e.g., MobileNets, ResNet, MixNet, etc.) and it demonstrates efficient performance on variety of histology image datasets. In this article, we have used four different publicly available histology dataset on two types of diseases to evaluate the performance of proposed technique. Analysis on colorectal cancer and breast cancer histology images suggests that the proposed model enhances the overall performance of the model in terms of accuracy, GFLOPs and inference time. Further, the proposed framework is also analyzed using benchmark cifar-10 dataset and comparison of our result is done with several state-of-the-art results on mutual/collaborative learning. To the best of our knowledge, we analyzed that the proposed model outperformed these recent models in terms of accuracy, GFLOPs and inference time. Extensive result analysis on different histology benchmark datasets and benchmark cifar-10 dataset suggests that the proposed model is a generally applicable model that could be used for various computer vision-based tasks.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 7\",\"pages\":\"1776-1786\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10838595/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10838595/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

深度模型在几乎每个研究领域都取得了成功，它们能够处理复杂的问题陈述。但大多数深度神经网络的规模都很大，有数百万/数十亿的参数，需要大量的资源和计算才能安装在边缘设备上。在本文中，我们提出了一种由多个小网络在运行时相互执行的有效协同教学策略，以持续提高神经网络的效率和泛化能力。与现有的蒸馏机制不同，该框架利用大容量预训练教师模型将知识单向转移到较小的网络中，该框架将所有网络视为“教师”（学生大小）并共同教授它们，使它们能够同时快速地进行计算，并具有更好的泛化。我们使用深度缩放和正则化将骨干网仔细划分为小网络。在协同教学过程中使用了多个小型网络，并利用提出的AdaCoRCE损失使网络相互学习。在训练过程中，为这些网络提供相同数据的两种不同视图，以增加其多样性。协同教学方案通过使用不同的数据视图和AdaCoRCE损失，使模型获得更强、唯一的知识表示。本文提供了一个可以应用于各种网络结构（例如，MobileNets, ResNet， MixNet等）的通用框架，并在各种组织学图像数据集上展示了高效的性能。在本文中，我们使用了四种不同的公开可用的两种疾病的组织学数据集来评估所提出技术的性能。对结直肠癌和乳腺癌组织学图像的分析表明，该模型在准确率、GFLOPs和推理时间方面提高了模型的整体性能。此外，还使用基准cifar-10数据集分析了所提出的框架，并将我们的结果与相互/协作学习的几个最先进的结果进行了比较。据我们所知，我们分析了所提出的模型在精度、GFLOPs和推理时间方面优于这些最近的模型。对不同组织学基准数据集和基准cifar-10数据集的大量结果分析表明，所提出的模型是一种普遍适用的模型，可用于各种基于计算机视觉的任务。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AdaCoRCE Loss for Knowledge Distillation: A Novel Approach With Network Fission and Co-Teaching Technique

Deep models have been successful in almost every research field, and they are capable of handling complex problem statements. But most of the deep neural networks are huge in size with millions/billions of parameters requiring heavy resources and computations to be installed in edge devices. In this article, we present an efficient co-teaching strategy consisting of multiple small networks performing mutually at runtime to consistently improve the efficiency and generalization ability of neural networks. Unlike existing distillation mechanism, that utilizes large capacity pre-train teacher model to transfer knowledge to a smaller network unidirectionally, proposed framework treats all the networks as ‘teacher’ (student-sized) and co-teach them allowing them to compute concurrently and quickly with better generalizations. We have carefully divided the backbone network into small network using depth scaling with regularizations. Multiple small networks are used during the co-teaching process, and the proposed AdaCoRCE loss is used to make the network learn from each other. During training, these networks are provided with the two different views of same data to increase their diversity. Co-teaching scheme allows model to fetch stronger and unique representation of knowledge by using different data views and AdaCoRCE loss. This article provides a generalized framework that could be applied to various network structures (e.g., MobileNets, ResNet, MixNet, etc.) and it demonstrates efficient performance on variety of histology image datasets. In this article, we have used four different publicly available histology dataset on two types of diseases to evaluate the performance of proposed technique. Analysis on colorectal cancer and breast cancer histology images suggests that the proposed model enhances the overall performance of the model in terms of accuracy, GFLOPs and inference time. Further, the proposed framework is also analyzed using benchmark cifar-10 dataset and comparison of our result is done with several state-of-the-art results on mutual/collaborative learning. To the best of our knowledge, we analyzed that the proposed model outperformed these recent models in terms of accuracy, GFLOPs and inference time. Extensive result analysis on different histology benchmark datasets and benchmark cifar-10 dataset suggests that the proposed model is a generally applicable model that could be used for various computer vision-based tasks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE transactions on artificial intelligence

CiteScore

7.70

自引率

0.00%

发文量