用于稀疏数据分析的硬件加速器

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE) Pub Date : 2016-03-14 DOI:10.3850/9783981537079_0766

E. Nurvitadhi, Asit K. Mishra, Yu Wang, Ganesh Venkatesh, Debbie Marr

{"title":"用于稀疏数据分析的硬件加速器","authors":"E. Nurvitadhi, Asit K. Mishra, Yu Wang, Ganesh Venkatesh, Debbie Marr","doi":"10.3850/9783981537079_0766","DOIUrl":null,"url":null,"abstract":"Rapid growth of Internet led to web applications that produce large unstructured sparse datasets (e.g., texts, ratings). Machine learning (ML) algorithms are the basis for many important analytics workloads that extract knowledge from these datasets. This paper characterizes such workloads on a high-end server for real-world datasets and shows that a set of sparse matrix operations dominates runtime. Further, they run inefficiently due to low compute-per-byte and challenging thread scaling behavior. As such, we propose a hardware accelerator to perform these operations with extreme efficiency. Simulations and RTL synthesis to 14nm ASIC demonstrate significant performance and performance/Watt improvements over conventional processors, with only a small area overhead.","PeriodicalId":311352,"journal":{"name":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Hardware accelerator for analytics of sparse data\",\"authors\":\"E. Nurvitadhi, Asit K. Mishra, Yu Wang, Ganesh Venkatesh, Debbie Marr\",\"doi\":\"10.3850/9783981537079_0766\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rapid growth of Internet led to web applications that produce large unstructured sparse datasets (e.g., texts, ratings). Machine learning (ML) algorithms are the basis for many important analytics workloads that extract knowledge from these datasets. This paper characterizes such workloads on a high-end server for real-world datasets and shows that a set of sparse matrix operations dominates runtime. Further, they run inefficiently due to low compute-per-byte and challenging thread scaling behavior. As such, we propose a hardware accelerator to perform these operations with extreme efficiency. Simulations and RTL synthesis to 14nm ASIC demonstrate significant performance and performance/Watt improvements over conventional processors, with only a small area overhead.\",\"PeriodicalId\":311352,\"journal\":{\"name\":\"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-03-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3850/9783981537079_0766\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3850/9783981537079_0766","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

Internet的快速发展导致web应用程序产生大型非结构化稀疏数据集(例如，文本，评级)。机器学习(ML)算法是从这些数据集中提取知识的许多重要分析工作负载的基础。本文描述了在高端服务器上处理真实数据集的这种工作负载，并展示了一组稀疏矩阵操作在运行时占主导地位。此外，由于每字节计算量较低，并且具有挑战性的线程缩放行为，它们的运行效率很低。因此，我们提出了一个硬件加速器，以极高的效率执行这些操作。14nm ASIC的模拟和RTL合成表明，与传统处理器相比，性能和性能/瓦特有了显著的提高，而面积开销很小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hardware accelerator for analytics of sparse data

Rapid growth of Internet led to web applications that produce large unstructured sparse datasets (e.g., texts, ratings). Machine learning (ML) algorithms are the basis for many important analytics workloads that extract knowledge from these datasets. This paper characterizes such workloads on a high-end server for real-world datasets and shows that a set of sparse matrix operations dominates runtime. Further, they run inefficiently due to low compute-per-byte and challenging thread scaling behavior. As such, we propose a hardware accelerator to perform these operations with extreme efficiency. Simulations and RTL synthesis to 14nm ASIC demonstrate significant performance and performance/Watt improvements over conventional processors, with only a small area overhead.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 Design, Automation & Test in Europe Conference & Exhibition (DATE)

自引率

0.00%

发文量