基于窄深网络的知识提炼

IF 2.8 4区计算机科学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neural Processing Letters Pub Date : 2024-06-06 DOI:10.1007/s11063-024-11646-5

Yan Zhou, Zhiqiang Wang, Jianxun Li

{"title":"基于窄深网络的知识提炼","authors":"Yan Zhou, Zhiqiang Wang, Jianxun Li","doi":"10.1007/s11063-024-11646-5","DOIUrl":null,"url":null,"abstract":"<p>Deep neural networks perform better than shallow neural networks, but the former tends to be deeper or wider, introducing large numbers of parameters and computations. We know that networks that are too wide have a high risk of overfitting and networks that are too deep require a large amount of computation. This paper proposed a narrow-deep ResNet, increasing the depth of the network while avoiding other issues caused by making the network too wide, and used the strategy of knowledge distillation, where we set up a trained teacher model to train an unmodified, wide, and narrow-deep ResNet that allows students to learn the teacher’s output. To validate the effectiveness of this method, it is tested on Cifar-100 and Pascal VOC datasets. The method proposed in this paper allows a small model to have about the same accuracy rate as a large model, while dramatically shrinking the response time and computational effort.</p>","PeriodicalId":51144,"journal":{"name":"Neural Processing Letters","volume":"15 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Knowledge Distillation Based on Narrow-Deep Networks\",\"authors\":\"Yan Zhou, Zhiqiang Wang, Jianxun Li\",\"doi\":\"10.1007/s11063-024-11646-5\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep neural networks perform better than shallow neural networks, but the former tends to be deeper or wider, introducing large numbers of parameters and computations. We know that networks that are too wide have a high risk of overfitting and networks that are too deep require a large amount of computation. This paper proposed a narrow-deep ResNet, increasing the depth of the network while avoiding other issues caused by making the network too wide, and used the strategy of knowledge distillation, where we set up a trained teacher model to train an unmodified, wide, and narrow-deep ResNet that allows students to learn the teacher’s output. To validate the effectiveness of this method, it is tested on Cifar-100 and Pascal VOC datasets. The method proposed in this paper allows a small model to have about the same accuracy rate as a large model, while dramatically shrinking the response time and computational effort.</p>\",\"PeriodicalId\":51144,\"journal\":{\"name\":\"Neural Processing Letters\",\"volume\":\"15 1\",\"pages\":\"\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-06-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neural Processing Letters\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1007/s11063-024-11646-5\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Processing Letters","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s11063-024-11646-5","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

深度神经网络比浅层神经网络性能更好，但前者往往更深或更广，会引入大量参数和计算。我们知道，过宽的网络有很高的过拟合风险，而过深的网络则需要大量计算。本文提出了一种窄深 ResNet，在增加网络深度的同时，避免了网络过宽带来的其他问题，并采用了知识提炼的策略，即我们设置一个经过训练的教师模型，训练一个未经修改的、宽而窄的 ResNet，让学生学习教师的输出。为了验证这种方法的有效性，我们在 Cifar-100 和 Pascal VOC 数据集上对其进行了测试。本文提出的方法使小型模型的准确率与大型模型大致相同，同时大大缩短了响应时间和计算工作量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Knowledge Distillation Based on Narrow-Deep Networks

查看原文本刊更多论文

Knowledge Distillation Based on Narrow-Deep Networks

Deep neural networks perform better than shallow neural networks, but the former tends to be deeper or wider, introducing large numbers of parameters and computations. We know that networks that are too wide have a high risk of overfitting and networks that are too deep require a large amount of computation. This paper proposed a narrow-deep ResNet, increasing the depth of the network while avoiding other issues caused by making the network too wide, and used the strategy of knowledge distillation, where we set up a trained teacher model to train an unmodified, wide, and narrow-deep ResNet that allows students to learn the teacher’s output. To validate the effectiveness of this method, it is tested on Cifar-100 and Pascal VOC datasets. The method proposed in this paper allows a small model to have about the same accuracy rate as a large model, while dramatically shrinking the response time and computational effort.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neural Processing Letters 工程技术-计算机：人工智能

CiteScore

4.90

自引率

12.90%

发文量

392

审稿时长

2.8 months

期刊介绍： Neural Processing Letters is an international journal publishing research results and innovative ideas on all aspects of artificial neural networks. Coverage includes theoretical developments, biological models, new formal modes, learning, applications, software and hardware developments, and prospective researches. The journal promotes fast exchange of information in the community of neural network researchers and users. The resurgence of interest in the field of artificial neural networks since the beginning of the 1980s is coupled to tremendous research activity in specialized or multidisciplinary groups. Research, however, is not possible without good communication between people and the exchange of information, especially in a field covering such different areas; fast communication is also a key aspect, and this is the reason for Neural Processing Letters