文本检测模型的高性能部署:压缩和硬件平台考虑

Nupur Sumeet, Karan Rawat, M. Nambiar
{"title":"文本检测模型的高性能部署:压缩和硬件平台考虑","authors":"Nupur Sumeet, Karan Rawat, M. Nambiar","doi":"10.1109/ISPASS55109.2022.00022","DOIUrl":null,"url":null,"abstract":"Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Performance Deployment of Text Detection Model: Compression and Hardware Platform considerations\",\"authors\":\"Nupur Sumeet, Karan Rawat, M. Nambiar\",\"doi\":\"10.1109/ISPASS55109.2022.00022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\\\\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\\\\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.\",\"PeriodicalId\":115391,\"journal\":{\"name\":\"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS55109.2022.00022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS55109.2022.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在商用加速器上,通常采用网络压缩来实现高吞吐量。我们提出了一种基于启发式的方法来获得具有硬件友好架构的压缩网络,作为计算昂贵的传统NAS算法的替代方案。所提出的压缩网络减少了142美元的内存占用,并在目标硬件平台上提供了5-8美元的吞吐量改进,同时保持了基线训练模型的5%以内的准确性。我们报告了一个文本检测任务在CPU、GPU和fpga上的性能加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
High-Performance Deployment of Text Detection Model: Compression and Hardware Platform considerations
Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信