文本检测模型的高性能部署:压缩和硬件平台考虑

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) Pub Date : 2022-05-01 DOI:10.1109/ISPASS55109.2022.00022

Nupur Sumeet, Karan Rawat, M. Nambiar

{"title":"文本检测模型的高性能部署:压缩和硬件平台考虑","authors":"Nupur Sumeet, Karan Rawat, M. Nambiar","doi":"10.1109/ISPASS55109.2022.00022","DOIUrl":null,"url":null,"abstract":"Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"High-Performance Deployment of Text Detection Model: Compression and Hardware Platform considerations\",\"authors\":\"Nupur Sumeet, Karan Rawat, M. Nambiar\",\"doi\":\"10.1109/ISPASS55109.2022.00022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\\\\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\\\\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.\",\"PeriodicalId\":115391,\"journal\":{\"name\":\"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPASS55109.2022.00022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS55109.2022.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在商用加速器上，通常采用网络压缩来实现高吞吐量。我们提出了一种基于启发式的方法来获得具有硬件友好架构的压缩网络，作为计算昂贵的传统NAS算法的替代方案。所提出的压缩网络减少了142美元的内存占用，并在目标硬件平台上提供了5-8美元的吞吐量改进，同时保持了基线训练模型的5%以内的准确性。我们报告了一个文本检测任务在CPU、GPU和fpga上的性能加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High-Performance Deployment of Text Detection Model: Compression and Hardware Platform considerations

Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)

自引率

0.00%

发文量