深度神经网络与Tonic:深度神经网络作为一种服务及其对未来仓库规模计算机的影响

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA) Pub Date : 2015-06-13 DOI:10.1145/2749469.2749472

Johann Hauswald, Yiping Kang, M. Laurenzano, Quan Chen, Cheng Li, T. Mudge, R. Dreslinski, Jason Mars, Lingjia Tang

{"title":"深度神经网络与Tonic:深度神经网络作为一种服务及其对未来仓库规模计算机的影响","authors":"Johann Hauswald, Yiping Kang, M. Laurenzano, Quan Chen, Cheng Li, T. Mudge, R. Dreslinski, Jason Mars, Lingjia Tang","doi":"10.1145/2749469.2749472","DOIUrl":null,"url":null,"abstract":"As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications. In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120× for all but one application (40× for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000× throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20×, depending on the composition of the workload.","PeriodicalId":6878,"journal":{"name":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","volume":"74 1","pages":"27-40"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"174","resultStr":"{\"title\":\"DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers\",\"authors\":\"Johann Hauswald, Yiping Kang, M. Laurenzano, Quan Chen, Cheng Li, T. Mudge, R. Dreslinski, Jason Mars, Lingjia Tang\",\"doi\":\"10.1145/2749469.2749472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications. In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120× for all but one application (40× for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000× throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20×, depending on the composition of the workload.\",\"PeriodicalId\":6878,\"journal\":{\"name\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"volume\":\"74 1\",\"pages\":\"27-40\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"174\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2749469.2749472\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2749469.2749472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 174

摘要

随着苹果Siri、b谷歌Now、微软Cortana和亚马逊Echo等应用程序的不断发展，网络服务公司正在采用大型深度神经网络(DNN)来应对机器学习挑战，如图像处理、语音识别、自然语言处理等。关于DNN专用服务器平台的设计以及现代仓库规模计算机(WSCs)应该如何配备以提供DNN作为这些应用程序的服务，出现了许多悬而未决的问题。在本文中，我们介绍了DjiNN，一个在WSCs中作为服务的深度神经网络的开放基础设施，以及Tonic Suite，一个由7个端到端应用程序组成的套件，涵盖图像、语音和语言处理。我们使用DjiNN设计了一个基于大规模GPU服务器设计的高吞吐量DNN系统，并提供了关于不同应用程序特征的见解。在研究了DjiNN和Tonic Suite的吞吐量、带宽和功耗特性之后，我们研究了未来WSC架构的几个设计要点。我们研究了具有分解GPU池的WSC与由同质集成GPU服务器组成的WSC的总拥有成本含义。我们在NVIDIA K40 GPU上将DNN吞吐量提高了120倍以上，除了一个应用程序(面部识别40倍)。在由8个NVIDIA k40组成的GPU服务器上，我们为7个应用程序中的3个实现了近线性扩展(大约1000倍的吞吐量提高)。通过我们的分析，我们还发现支持gpu的wsc比仅支持cpu的设计提高了4-20倍的总拥有成本，具体取决于工作负载的组成。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DjiNN and Tonic: DNN as a service and its implications for future warehouse scale computers

As applications such as Apple Siri, Google Now, Microsoft Cortana, and Amazon Echo continue to gain traction, webservice companies are adopting large deep neural networks (DNN) for machine learning challenges such as image processing, speech recognition, natural language processing, among others. A number of open questions arise as to the design of a server platform specialized for DNN and how modern warehouse scale computers (WSCs) should be outfitted to provide DNN as a service for these applications. In this paper, we present DjiNN, an open infrastructure for DNN as a service in WSCs, and Tonic Suite, a suite of 7 end-to-end applications that span image, speech, and language processing. We use DjiNN to design a high throughput DNN system based on massive GPU server designs and provide insights as to the varying characteristics across applications. After studying the throughput, bandwidth, and power properties of DjiNN and Tonic Suite, we investigate several design points for future WSC architectures. We investigate the total cost of ownership implications of having a WSC with a disaggregated GPU pool versus a WSC composed of homogeneous integrated GPU servers. We improve DNN throughput by over 120× for all but one application (40× for Facial Recognition) on an NVIDIA K40 GPU. On a GPU server composed of 8 NVIDIA K40s, we achieve near-linear scaling (around 1000× throughput improvement) for 3 of the 7 applications. Through our analysis, we also find that GPU-enabled WSCs improve total cost of ownership over CPU-only designs by 4-20×, depending on the composition of the workload.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA)

自引率

0.00%

发文量