Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg
{"title":"深度学习中的软硬件性能","authors":"Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg","doi":"10.1049/pbpc022e_ch6","DOIUrl":null,"url":null,"abstract":"In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.","PeriodicalId":254920,"journal":{"name":"Many-Core Computing: Hardware and Software","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Hardware and software performance in deep learning\",\"authors\":\"Andrew Anderson, James Garland, Yuan Wen, B. Barabasz, Kaveena Persand, Aravind Vasudevan, David Gregg\",\"doi\":\"10.1049/pbpc022e_ch6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.\",\"PeriodicalId\":254920,\"journal\":{\"name\":\"Many-Core Computing: Hardware and Software\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Many-Core Computing: Hardware and Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1049/pbpc022e_ch6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Many-Core Computing: Hardware and Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1049/pbpc022e_ch6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Hardware and software performance in deep learning
In recent years, deep neural networks (DNNs) have emerged as the most successful technology for many difficult problems in image, video, voice and text processing. DNNs are resource hungry and require very large amounts of computation and memory, which is a particular challenge on IoT, mobile and embedded systems. In this chapter, we outline some major performance challenges of DNNs such as computation, parallelism, data locality and memory requirements. We describe research on these problems, such as the use of existing high-performance linear algebra libraries, hardware acceleration, reduced-precision storage and arithmetic and sparse data representations. Finally, we discuss recent trends in adapting compiler and domain-specific program generation techniques to create high-performance parallel DNN programs.