Accelerating the Run-Time of Convolutional Neural Networks through Weight Pruning and Quantization

2023 8th International Engineering Conference on Renewable Energy & Sustainability (ieCRES) Pub Date : 2023-05-08 DOI:10.1109/ieCRES57315.2023.10209460

Rajai Alhimdiat, W. Ashour, Ramy Battrawy, D. Stricker

{"title":"Accelerating the Run-Time of Convolutional Neural Networks through Weight Pruning and Quantization","authors":"Rajai Alhimdiat, W. Ashour, Ramy Battrawy, D. Stricker","doi":"10.1109/ieCRES57315.2023.10209460","DOIUrl":null,"url":null,"abstract":"Accelerating the processing of Convolutional Neural Networks (CNNs) is highly demand in the field of Artificial Intelligence (AI), particularly in computer vision domains. The efficiency of memory resources is crucial in measuring run-time, and weight pruning and quantization techniques have been studied extensively to optimize this efficiency. In this work, we investigate the contribution of these techniques to accelerate a pre-trained CNN model. We adapt the percentile-based weights pruning with focusing on unstructured pruning by dynamically adjusting the pruning thresholds based on the fine-tuning performance of the model. In the same context, we perform uniform quantization for presenting the weights values of the model’s parameters with a fixed number of bits. We implement different levels of post-training and aware-training -fine-tuning the model with the same learning rate and number of epochs as the original. We then refine-tune the model with a lower learning rate and a factor of 10x for both techniques. Finally, we combine the best levels of pruning and quantization and refine-tune the model to explore the best-pruned and quantized pre-trained model. We evaluate each level of the techniques and analyze their trade-offs. Our results demonstrate the effectiveness of our strategy in accelerating the CNN and improving its efficiency, and provide insights into the best combination of techniques to accelerate its inference time.","PeriodicalId":431920,"journal":{"name":"2023 8th International Engineering Conference on Renewable Energy & Sustainability (ieCRES)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 8th International Engineering Conference on Renewable Energy & Sustainability (ieCRES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ieCRES57315.2023.10209460","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Accelerating the processing of Convolutional Neural Networks (CNNs) is highly demand in the field of Artificial Intelligence (AI), particularly in computer vision domains. The efficiency of memory resources is crucial in measuring run-time, and weight pruning and quantization techniques have been studied extensively to optimize this efficiency. In this work, we investigate the contribution of these techniques to accelerate a pre-trained CNN model. We adapt the percentile-based weights pruning with focusing on unstructured pruning by dynamically adjusting the pruning thresholds based on the fine-tuning performance of the model. In the same context, we perform uniform quantization for presenting the weights values of the model’s parameters with a fixed number of bits. We implement different levels of post-training and aware-training -fine-tuning the model with the same learning rate and number of epochs as the original. We then refine-tune the model with a lower learning rate and a factor of 10x for both techniques. Finally, we combine the best levels of pruning and quantization and refine-tune the model to explore the best-pruned and quantized pre-trained model. We evaluate each level of the techniques and analyze their trade-offs. Our results demonstrate the effectiveness of our strategy in accelerating the CNN and improving its efficiency, and provide insights into the best combination of techniques to accelerate its inference time.

查看原文本刊更多论文

通过权值修剪和量化加速卷积神经网络的运行时间

加速卷积神经网络(cnn)的处理在人工智能(AI)领域，特别是在计算机视觉领域有着很高的需求。内存资源的效率是衡量运行时间的关键，为了优化内存资源的效率，人们广泛研究了权值修剪和量化技术。在这项工作中，我们研究了这些技术对加速预训练CNN模型的贡献。我们根据模型的微调性能动态调整剪枝阈值，对基于百分位的权值剪枝进行改进，重点关注非结构化剪枝。在相同的上下文中，我们执行统一量化，以固定位数表示模型参数的权重值。我们在与原始模型相同的学习率和epoch数下，实现了不同级别的后训练和意识训练微调模型。然后，我们用较低的学习率和两种技术的10倍因子来优化模型。最后，我们将最佳修剪和量化水平结合起来，对模型进行微调，以探索最佳修剪和量化的预训练模型。我们评估每个级别的技术，并分析它们的权衡。我们的结果证明了我们的策略在加速CNN和提高其效率方面的有效性，并为加速其推理时间的最佳技术组合提供了见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 8th International Engineering Conference on Renewable Energy & Sustainability (ieCRES)

自引率

0.00%

发文量