深度学习中的软硬件性能

Many-Core Computing: Hardware and Software Pub Date : 2019-06-03 DOI:10.1049/pbpc022e_ch6

Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg

{"title":"深度学习中的软硬件性能","authors":"Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg","doi":"10.1049/pbpc022e_ch6","DOIUrl":null,"url":null,"abstract":"In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware and software performance in deep learning\",\"authors\":\"Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg\",\"doi\":\"10.1049/pbpc022e_ch6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.\",\"PeriodicalId\":254920,\"journal\":{\"name\":\"Many-Core Computing: Hardware and Software\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Many-Core Computing: Hardware and Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/pbpc022e_ch6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Many-Core Computing: Hardware and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/pbpc022e_ch6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，深度神经网络(dnn)已成为解决图像、视频、语音和文本处理中许多难题的最成功的技术。dnn需要大量的资源，需要大量的计算和内存，这对物联网、移动和嵌入式系统来说是一个特别的挑战。在本章中，我们概述了dnn的一些主要性能挑战，如计算，并行性，数据局部性和内存需求。我们描述了对这些问题的研究，例如使用现有的高性能线性代数库，硬件加速，降低精度的存储和算术以及稀疏数据表示。最后，我们讨论了采用编译器和特定领域程序生成技术来创建高性能并行DNN程序的最新趋势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Hardware and software performance in deep learning

In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Many-Core Computing: Hardware and Software

自引率

0.00%

发文量