ARGA

Proceedings of the 56th Annual Design Automation Conference 2019 Pub Date : 2019-06-02 DOI:10.1145/3316781.3317776

Daniel Peroni, M. Imani, Hamid Nejatollahi, N. Dutt, Tajana Rosing

{"title":"ARGA","authors":"Daniel Peroni, M. Imani, Hamid Nejatollahi, N. Dutt, Tajana Rosing","doi":"10.1145/3316781.3317776","DOIUrl":null,"url":null,"abstract":"Many data-driven applications including computer vision, speech recognition, and medical diagnostics show tolerance to error during computation. These applications are often accelerated on GPUs, but high computational costs limit performance and increase energy usage. In this paper, we present ARGA, an approximate computing technique capable of accelerating GPGPU applications. ARGA provides an approximate lookup table to GPGPU cores to avoid recomputing instructions with identical or similar values. We propose multi-table parallel lookupwhich enables computational reuse to significantly speed-up GPGPU computation by checking incoming instructions in parallel. The inputs of each operation are searched for in a lookup table. Matches resulting in an exact or low error are removed from the floating point pipeline and used directly as output. Matches producing highly inaccurate results are computed on exact hardware to minimize application error. We simulate our design by placing ARGA within each core of an Nvidia Kepler Architecture Titan and an AMD Southern Island 7970. We show our design improves performance throughput by up to $2.7 \\times$ and improves EDP by $5.3 \\times$ for 6 GPGPU applications while maintaining less than 5% output error. We also show ARGA accelerates inference of a LeNet NN by $2.1 \\times$ and improves EDP by $3.7 \\times$ without significantly impacting classification accuracy. CCS CONCEPTS •Computer systems organization $\\rightarrow$ Multicore architectures; •Computing methodologies $\\rightarrow$ Machine learning approaches.","PeriodicalId":391209,"journal":{"name":"Proceedings of the 56th Annual Design Automation Conference 2019","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 56th Annual Design Automation Conference 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3316781.3317776","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Many data-driven applications including computer vision, speech recognition, and medical diagnostics show tolerance to error during computation. These applications are often accelerated on GPUs, but high computational costs limit performance and increase energy usage. In this paper, we present ARGA, an approximate computing technique capable of accelerating GPGPU applications. ARGA provides an approximate lookup table to GPGPU cores to avoid recomputing instructions with identical or similar values. We propose multi-table parallel lookupwhich enables computational reuse to significantly speed-up GPGPU computation by checking incoming instructions in parallel. The inputs of each operation are searched for in a lookup table. Matches resulting in an exact or low error are removed from the floating point pipeline and used directly as output. Matches producing highly inaccurate results are computed on exact hardware to minimize application error. We simulate our design by placing ARGA within each core of an Nvidia Kepler Architecture Titan and an AMD Southern Island 7970. We show our design improves performance throughput by up to $2.7 \times$ and improves EDP by $5.3 \times$ for 6 GPGPU applications while maintaining less than 5% output error. We also show ARGA accelerates inference of a LeNet NN by $2.1 \times$ and improves EDP by $3.7 \times$ without significantly impacting classification accuracy. CCS CONCEPTS •Computer systems organization $\rightarrow$ Multicore architectures; •Computing methodologies $\rightarrow$ Machine learning approaches.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 56th Annual Design Automation Conference 2019

自引率

0.00%

发文量