Design and Implementation of a Fast Convolution Algorithm for Embedded Platform

Zhenyu Yin, Feiqing Zhang, Jiangbo Wang, Fulong Xu, Chao Fan
{"title":"Design and Implementation of a Fast Convolution Algorithm for Embedded Platform","authors":"Zhenyu Yin, Feiqing Zhang, Jiangbo Wang, Fulong Xu, Chao Fan","doi":"10.1109/ICTech55460.2022.00041","DOIUrl":null,"url":null,"abstract":"In recent years, deep learning has been gradually applied to the industry with great success. As the demand for the lightweight intelligent devices increases, the deployment of deep learning models on embedded platforms to meet users' needs for real-time performance has become a trend in the development of intelligence. However, due to the pursuit of higher accuracy, existing deep learning frameworks are becoming richer in functionality and more complex in computation. A large amount of memory requirements and computational power demands make it challenging to deploy neural network computing frameworks on embedded platforms with limited resources and computational power. The WPOC algorithm is proposed and integrated into the Darknet framework to address real-time image processing based on the Winograd algorithm. Tested on the ZYNQ-7010 platform was passed. The results show that the WPOC algorithm proposed in this paper can effectively speed up image recognition by about six times under the VGG-16 model while ensuring the same accuracy rate.","PeriodicalId":290836,"journal":{"name":"2022 11th International Conference of Information and Communication Technology (ICTech))","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 11th International Conference of Information and Communication Technology (ICTech))","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTech55460.2022.00041","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In recent years, deep learning has been gradually applied to the industry with great success. As the demand for the lightweight intelligent devices increases, the deployment of deep learning models on embedded platforms to meet users' needs for real-time performance has become a trend in the development of intelligence. However, due to the pursuit of higher accuracy, existing deep learning frameworks are becoming richer in functionality and more complex in computation. A large amount of memory requirements and computational power demands make it challenging to deploy neural network computing frameworks on embedded platforms with limited resources and computational power. The WPOC algorithm is proposed and integrated into the Darknet framework to address real-time image processing based on the Winograd algorithm. Tested on the ZYNQ-7010 platform was passed. The results show that the WPOC algorithm proposed in this paper can effectively speed up image recognition by about six times under the VGG-16 model while ensuring the same accuracy rate.
嵌入式平台快速卷积算法的设计与实现
近年来,深度学习逐渐被应用到行业中,并取得了巨大的成功。随着智能设备轻量化需求的增加,在嵌入式平台上部署深度学习模型以满足用户对实时性能的需求已成为智能发展的趋势。然而,由于对更高精度的追求,现有的深度学习框架在功能上变得越来越丰富,在计算上变得越来越复杂。大量的内存需求和计算能力需求使得在资源和计算能力有限的嵌入式平台上部署神经网络计算框架具有挑战性。为了解决基于Winograd算法的实时图像处理问题,提出了WPOC算法并将其集成到Darknet框架中。在ZYNQ-7010平台上测试通过。结果表明,在VGG-16模型下,本文提出的WPOC算法在保证相同准确率的情况下,可以有效地将图像识别速度提高约6倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信