{"title":"AdaCoRCE损失知识蒸馏:网络裂变与协同教学技术的新方法","authors":"Shankey Garg;Pradeep Singh","doi":"10.1109/TAI.2025.3527402","DOIUrl":null,"url":null,"abstract":"Deep models have been successful in almost every research field, and they are capable of handling complex problem statements. But most of the deep neural networks are huge in size with millions/billions of parameters requiring heavy resources and computations to be installed in edge devices. In this article, we present an efficient co-teaching strategy consisting of multiple small networks performing mutually at runtime to consistently improve the efficiency and generalization ability of neural networks. Unlike existing distillation mechanism, that utilizes large capacity pre-train teacher model to transfer knowledge to a smaller network unidirectionally, proposed framework treats all the networks as ‘teacher’ (student-sized) and co-teach them allowing them to compute concurrently and quickly with better generalizations. We have carefully divided the backbone network into small network using depth scaling with regularizations. Multiple small networks are used during the co-teaching process, and the proposed AdaCoRCE loss is used to make the network learn from each other. During training, these networks are provided with the two different views of same data to increase their diversity. Co-teaching scheme allows model to fetch stronger and unique representation of knowledge by using different data views and AdaCoRCE loss. This article provides a generalized framework that could be applied to various network structures (e.g., MobileNets, ResNet, MixNet, etc.) and it demonstrates efficient performance on variety of histology image datasets. In this article, we have used four different publicly available histology dataset on two types of diseases to evaluate the performance of proposed technique. Analysis on colorectal cancer and breast cancer histology images suggests that the proposed model enhances the overall performance of the model in terms of accuracy, GFLOPs and inference time. Further, the proposed framework is also analyzed using benchmark cifar-10 dataset and comparison of our result is done with several state-of-the-art results on mutual/collaborative learning. To the best of our knowledge, we analyzed that the proposed model outperformed these recent models in terms of accuracy, GFLOPs and inference time. Extensive result analysis on different histology benchmark datasets and benchmark cifar-10 dataset suggests that the proposed model is a generally applicable model that could be used for various computer vision-based tasks.","PeriodicalId":73305,"journal":{"name":"IEEE transactions on artificial intelligence","volume":"6 7","pages":"1776-1786"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AdaCoRCE Loss for Knowledge Distillation: A Novel Approach With Network Fission and Co-Teaching Technique\",\"authors\":\"Shankey Garg;Pradeep Singh\",\"doi\":\"10.1109/TAI.2025.3527402\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep models have been successful in almost every research field, and they are capable of handling complex problem statements. But most of the deep neural networks are huge in size with millions/billions of parameters requiring heavy resources and computations to be installed in edge devices. In this article, we present an efficient co-teaching strategy consisting of multiple small networks performing mutually at runtime to consistently improve the efficiency and generalization ability of neural networks. Unlike existing distillation mechanism, that utilizes large capacity pre-train teacher model to transfer knowledge to a smaller network unidirectionally, proposed framework treats all the networks as ‘teacher’ (student-sized) and co-teach them allowing them to compute concurrently and quickly with better generalizations. We have carefully divided the backbone network into small network using depth scaling with regularizations. Multiple small networks are used during the co-teaching process, and the proposed AdaCoRCE loss is used to make the network learn from each other. During training, these networks are provided with the two different views of same data to increase their diversity. Co-teaching scheme allows model to fetch stronger and unique representation of knowledge by using different data views and AdaCoRCE loss. This article provides a generalized framework that could be applied to various network structures (e.g., MobileNets, ResNet, MixNet, etc.) and it demonstrates efficient performance on variety of histology image datasets. In this article, we have used four different publicly available histology dataset on two types of diseases to evaluate the performance of proposed technique. Analysis on colorectal cancer and breast cancer histology images suggests that the proposed model enhances the overall performance of the model in terms of accuracy, GFLOPs and inference time. Further, the proposed framework is also analyzed using benchmark cifar-10 dataset and comparison of our result is done with several state-of-the-art results on mutual/collaborative learning. To the best of our knowledge, we analyzed that the proposed model outperformed these recent models in terms of accuracy, GFLOPs and inference time. Extensive result analysis on different histology benchmark datasets and benchmark cifar-10 dataset suggests that the proposed model is a generally applicable model that could be used for various computer vision-based tasks.\",\"PeriodicalId\":73305,\"journal\":{\"name\":\"IEEE transactions on artificial intelligence\",\"volume\":\"6 7\",\"pages\":\"1776-1786\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE transactions on artificial intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10838595/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on artificial intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10838595/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
AdaCoRCE Loss for Knowledge Distillation: A Novel Approach With Network Fission and Co-Teaching Technique
Deep models have been successful in almost every research field, and they are capable of handling complex problem statements. But most of the deep neural networks are huge in size with millions/billions of parameters requiring heavy resources and computations to be installed in edge devices. In this article, we present an efficient co-teaching strategy consisting of multiple small networks performing mutually at runtime to consistently improve the efficiency and generalization ability of neural networks. Unlike existing distillation mechanism, that utilizes large capacity pre-train teacher model to transfer knowledge to a smaller network unidirectionally, proposed framework treats all the networks as ‘teacher’ (student-sized) and co-teach them allowing them to compute concurrently and quickly with better generalizations. We have carefully divided the backbone network into small network using depth scaling with regularizations. Multiple small networks are used during the co-teaching process, and the proposed AdaCoRCE loss is used to make the network learn from each other. During training, these networks are provided with the two different views of same data to increase their diversity. Co-teaching scheme allows model to fetch stronger and unique representation of knowledge by using different data views and AdaCoRCE loss. This article provides a generalized framework that could be applied to various network structures (e.g., MobileNets, ResNet, MixNet, etc.) and it demonstrates efficient performance on variety of histology image datasets. In this article, we have used four different publicly available histology dataset on two types of diseases to evaluate the performance of proposed technique. Analysis on colorectal cancer and breast cancer histology images suggests that the proposed model enhances the overall performance of the model in terms of accuracy, GFLOPs and inference time. Further, the proposed framework is also analyzed using benchmark cifar-10 dataset and comparison of our result is done with several state-of-the-art results on mutual/collaborative learning. To the best of our knowledge, we analyzed that the proposed model outperformed these recent models in terms of accuracy, GFLOPs and inference time. Extensive result analysis on different histology benchmark datasets and benchmark cifar-10 dataset suggests that the proposed model is a generally applicable model that could be used for various computer vision-based tasks.