{"title":"High-Performance Deployment of Text Detection Model: Compression and Hardware Platform considerations","authors":"Nupur Sumeet, Karan Rawat, M. Nambiar","doi":"10.1109/ISPASS55109.2022.00022","DOIUrl":null,"url":null,"abstract":"Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.","PeriodicalId":115391,"journal":{"name":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS55109.2022.00022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Network compression is often adopted for high throughput implementation on commercial accelerators. We propose a heuristic based approach to obtain compressed networks with a hardware-friendly architecture as an alternative to conventional NAS algorithms that are computationally expensive. The proposed compressed network introduces 142 $\times$ memory-footprint reduction and provide throughput improvement of 5-8 $\times$ on target hardware platforms, while retaining accuracy within 5% of the baseline trained model. We report performance acceleration on CPU, GPU, and FPGAs for a text detection task.