量化YOLOv7:一项综合研究

2023 28th International Computer Conference, Computer Society of Iran (CSICC) Pub Date : 2023-01-25 DOI:10.1109/CSICC58665.2023.10105310

Mohammadamin Baghbanbashi, Mohsen Raji, B. Ghavami

{"title":"量化YOLOv7:一项综合研究","authors":"Mohammadamin Baghbanbashi, Mohsen Raji, B. Ghavami","doi":"10.1109/CSICC58665.2023.10105310","DOIUrl":null,"url":null,"abstract":"YOLO is a deep neural network (DNN) model presented for robust real-time object detection following the one-stage inference approach. It outperforms other real-time object detectors in terms of speed and accuracy by a wide margin. Nevertheless, since YOLO is developed upon a DNN backbone with numerous parameters, it will cause excessive memory load, thereby deploying it on memory-constrained devices is a severe challenge in practice. To overcome this limitation, model compression techniques, such as quantizing parameters to lower-precision values, can be adopted. As the most recent version of YOLO, YOLOv7 achieves such state-of-the-art performance in speed and accuracy in the range of 5 FPS to 160 FPS that it surpasses all former versions of YOLO and other existing models in this regard. So far, the robustness of several quantization schemes has been evaluated on older versions of YOLO. These methods may not necessarily yield similar results for YOLOv7 as it utilizes a different architecture. In this paper, we conduct in-depth research on the effectiveness of a variety of quantization schemes on the pre-trained weights of the state-of-the-art YOLOv7 model. Experimental results demonstrate that using 4-bit quantization coupled with the combination of different granularities results in ~3.92x and ~3.86x memory-saving for uniform and non-uniform quantization, respectively, with only 2.5% and 1% accuracy loss compared to the full-precision baseline model.","PeriodicalId":127277,"journal":{"name":"2023 28th International Computer Conference, Computer Society of Iran (CSICC)","volume":"1209 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Quantizing YOLOv7: A Comprehensive Study\",\"authors\":\"Mohammadamin Baghbanbashi, Mohsen Raji, B. Ghavami\",\"doi\":\"10.1109/CSICC58665.2023.10105310\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"YOLO is a deep neural network (DNN) model presented for robust real-time object detection following the one-stage inference approach. It outperforms other real-time object detectors in terms of speed and accuracy by a wide margin. Nevertheless, since YOLO is developed upon a DNN backbone with numerous parameters, it will cause excessive memory load, thereby deploying it on memory-constrained devices is a severe challenge in practice. To overcome this limitation, model compression techniques, such as quantizing parameters to lower-precision values, can be adopted. As the most recent version of YOLO, YOLOv7 achieves such state-of-the-art performance in speed and accuracy in the range of 5 FPS to 160 FPS that it surpasses all former versions of YOLO and other existing models in this regard. So far, the robustness of several quantization schemes has been evaluated on older versions of YOLO. These methods may not necessarily yield similar results for YOLOv7 as it utilizes a different architecture. In this paper, we conduct in-depth research on the effectiveness of a variety of quantization schemes on the pre-trained weights of the state-of-the-art YOLOv7 model. Experimental results demonstrate that using 4-bit quantization coupled with the combination of different granularities results in ~3.92x and ~3.86x memory-saving for uniform and non-uniform quantization, respectively, with only 2.5% and 1% accuracy loss compared to the full-precision baseline model.\",\"PeriodicalId\":127277,\"journal\":{\"name\":\"2023 28th International Computer Conference, Computer Society of Iran (CSICC)\",\"volume\":\"1209 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 28th International Computer Conference, Computer Society of Iran (CSICC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSICC58665.2023.10105310\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 28th International Computer Conference, Computer Society of Iran (CSICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSICC58665.2023.10105310","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

YOLO是一种深度神经网络(DNN)模型，采用单阶段推理方法，用于鲁棒实时目标检测。在速度和准确性方面，它大大优于其他实时对象检测器。然而，由于YOLO是在具有众多参数的DNN骨干网上开发的，它将导致过度的内存负载，因此在实践中将其部署在内存受限的设备上是一个严峻的挑战。为了克服这一限制，可以采用模型压缩技术，例如将参数量化为较低精度值。作为YOLO的最新版本，YOLOv7在速度和精度方面达到了最先进的性能，在5帧/秒到160帧/秒的范围内，超过了所有以前的YOLO版本和其他现有型号。到目前为止，几种量化方案的鲁棒性已经在旧版本的YOLO上进行了评估。对于YOLOv7，这些方法不一定产生类似的结果，因为它使用不同的体系结构。在本文中，我们深入研究了各种量化方案对最先进的YOLOv7模型预训练权值的有效性。实验结果表明，与全精度基线模型相比，使用4位量化与不同粒度组合相结合，均匀量化和非均匀量化分别节省了~3.92倍和~3.86倍的内存，精度损失仅为2.5%和1%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Quantizing YOLOv7: A Comprehensive Study

YOLO is a deep neural network (DNN) model presented for robust real-time object detection following the one-stage inference approach. It outperforms other real-time object detectors in terms of speed and accuracy by a wide margin. Nevertheless, since YOLO is developed upon a DNN backbone with numerous parameters, it will cause excessive memory load, thereby deploying it on memory-constrained devices is a severe challenge in practice. To overcome this limitation, model compression techniques, such as quantizing parameters to lower-precision values, can be adopted. As the most recent version of YOLO, YOLOv7 achieves such state-of-the-art performance in speed and accuracy in the range of 5 FPS to 160 FPS that it surpasses all former versions of YOLO and other existing models in this regard. So far, the robustness of several quantization schemes has been evaluated on older versions of YOLO. These methods may not necessarily yield similar results for YOLOv7 as it utilizes a different architecture. In this paper, we conduct in-depth research on the effectiveness of a variety of quantization schemes on the pre-trained weights of the state-of-the-art YOLOv7 model. Experimental results demonstrate that using 4-bit quantization coupled with the combination of different granularities results in ~3.92x and ~3.86x memory-saving for uniform and non-uniform quantization, respectively, with only 2.5% and 1% accuracy loss compared to the full-precision baseline model.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 28th International Computer Conference, Computer Society of Iran (CSICC)

自引率

0.00%

发文量