{"title":"提高容错应用的性能:一个现成的神经加速器的近似案例研究","authors":"Tomas Gonzalez-Aragon, Jorge Castro-Godínez","doi":"10.1109/jocici54528.2021.9794353","DOIUrl":null,"url":null,"abstract":"Trending workloads and applications are leading many of the new advances in computer architecture and design paradigms. For instance, deep learning applications are transforming the way we do computing. On one hand, specific architectures are currently commercialized as neural processing units, specialized hardware accelerators for these types of applications, achieving significant performance improvements. On the other hand, design paradigms, such as approximate computing, exploit existing inherent tolerance to imprecise computations in these applications to reduce their computation complexity and produce energy-efficient implementations. Relevant available approximations are limited to the software layer to improve the performance of deep learning applications when using an off-the-shelf specialized accelerator alongside edge computing platforms. In this work, we present a case study of performance improvement by introducing approximate computing techniques to three deep learning classification applications. Our test platform is a Raspberry Pi 4, as edge computing device, and a Movidius Myriad X, as neural accelerator. Our experimental results show that using a mixture of approximate techniques can achieve a performance improvement from 20x to 48x with no accuracy degradation for a compute-intensive classification application.","PeriodicalId":339143,"journal":{"name":"2021 IEEE V Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)","volume":"9 3","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving Performance of Error-Tolerant Applications: A Case Study of Approximations on an Off-the-Shelf Neural Accelerator\",\"authors\":\"Tomas Gonzalez-Aragon, Jorge Castro-Godínez\",\"doi\":\"10.1109/jocici54528.2021.9794353\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Trending workloads and applications are leading many of the new advances in computer architecture and design paradigms. For instance, deep learning applications are transforming the way we do computing. On one hand, specific architectures are currently commercialized as neural processing units, specialized hardware accelerators for these types of applications, achieving significant performance improvements. On the other hand, design paradigms, such as approximate computing, exploit existing inherent tolerance to imprecise computations in these applications to reduce their computation complexity and produce energy-efficient implementations. Relevant available approximations are limited to the software layer to improve the performance of deep learning applications when using an off-the-shelf specialized accelerator alongside edge computing platforms. In this work, we present a case study of performance improvement by introducing approximate computing techniques to three deep learning classification applications. Our test platform is a Raspberry Pi 4, as edge computing device, and a Movidius Myriad X, as neural accelerator. Our experimental results show that using a mixture of approximate techniques can achieve a performance improvement from 20x to 48x with no accuracy degradation for a compute-intensive classification application.\",\"PeriodicalId\":339143,\"journal\":{\"name\":\"2021 IEEE V Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)\",\"volume\":\"9 3\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE V Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/jocici54528.2021.9794353\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE V Jornadas Costarricenses de Investigación en Computación e Informática (JoCICI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/jocici54528.2021.9794353","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Improving Performance of Error-Tolerant Applications: A Case Study of Approximations on an Off-the-Shelf Neural Accelerator
Trending workloads and applications are leading many of the new advances in computer architecture and design paradigms. For instance, deep learning applications are transforming the way we do computing. On one hand, specific architectures are currently commercialized as neural processing units, specialized hardware accelerators for these types of applications, achieving significant performance improvements. On the other hand, design paradigms, such as approximate computing, exploit existing inherent tolerance to imprecise computations in these applications to reduce their computation complexity and produce energy-efficient implementations. Relevant available approximations are limited to the software layer to improve the performance of deep learning applications when using an off-the-shelf specialized accelerator alongside edge computing platforms. In this work, we present a case study of performance improvement by introducing approximate computing techniques to three deep learning classification applications. Our test platform is a Raspberry Pi 4, as edge computing device, and a Movidius Myriad X, as neural accelerator. Our experimental results show that using a mixture of approximate techniques can achieve a performance improvement from 20x to 48x with no accuracy degradation for a compute-intensive classification application.