{"title":"Optimizing Deep Learning Models for Resource-Constrained Environments With Cluster-Quantized Knowledge Distillation","authors":"Niaz Ashraf Khan, A. M. Saadman Rafat","doi":"10.1002/eng2.70187","DOIUrl":null,"url":null,"abstract":"<p>Deep convolutional neural networks (CNNs) are highly effective in computer vision tasks but remain challenging to deploy in resource-constrained environments due to their high computational and memory requirements. Conventional model compression techniques, such as pruning and post-training quantization, often compromise model accuracy by decoupling compression from training. Furthermore, traditional knowledge distillation approaches rely on full-precision teacher models, limiting their effectiveness in compressed settings. To address these issues, we propose Cluster-Quantized Knowledge Distillation (CQKD), a novel framework that integrates structured pruning with knowledge distillation, incorporating cluster-based weight quantization directly into the training loop. Unlike existing methods, CQKD applies quantization to both the teacher and student models, ensuring a more effective transfer of compressed knowledge. By leveraging layer-wise K-means clustering, our approach achieves extreme model compression while maintaining high accuracy. Experimental results on CIFAR-10 and CIFAR-100 demonstrate the effectiveness of CQKD, achieving compression ratios of 34,000× while preserving competitive accuracy—97.9% on CIFAR-10 and 91.2% on CIFAR-100. These results highlight the feasibility of CQKD for efficient deep learning model deployment in low-resource environments.</p>","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":"7 5","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/eng2.70187","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/eng2.70187","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Deep convolutional neural networks (CNNs) are highly effective in computer vision tasks but remain challenging to deploy in resource-constrained environments due to their high computational and memory requirements. Conventional model compression techniques, such as pruning and post-training quantization, often compromise model accuracy by decoupling compression from training. Furthermore, traditional knowledge distillation approaches rely on full-precision teacher models, limiting their effectiveness in compressed settings. To address these issues, we propose Cluster-Quantized Knowledge Distillation (CQKD), a novel framework that integrates structured pruning with knowledge distillation, incorporating cluster-based weight quantization directly into the training loop. Unlike existing methods, CQKD applies quantization to both the teacher and student models, ensuring a more effective transfer of compressed knowledge. By leveraging layer-wise K-means clustering, our approach achieves extreme model compression while maintaining high accuracy. Experimental results on CIFAR-10 and CIFAR-100 demonstrate the effectiveness of CQKD, achieving compression ratios of 34,000× while preserving competitive accuracy—97.9% on CIFAR-10 and 91.2% on CIFAR-100. These results highlight the feasibility of CQKD for efficient deep learning model deployment in low-resource environments.