分布式深度卷积神经网络的理论可扩展性分析

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2019-05-01 DOI:10.1109/CCGRID.2019.00068

Adrián Castelló, M. F. Dolz, E. S. Quintana‐Ortí, J. Duato

{"title":"分布式深度卷积神经网络的理论可扩展性分析","authors":"Adrián Castelló, M. F. Dolz, E. S. Quintana‐Ortí, J. Duato","doi":"10.1109/CCGRID.2019.00068","DOIUrl":null,"url":null,"abstract":"We analyze the asymptotic performance of the training process of deep neural networks (NN) on clusters in order to determine the scalability. For this purpose, i) we assume a data parallel implementation of the training algorithm, which distributes the batches among the cluster nodes and replicates the model; ii) we leverage the roofline model to inspect the performance at the node level, taking into account the floating-point unit throughput and memory bandwidth; and iii) we consider distinct collective communication schemes that are optimal depending on the message size and underlying network interconnection topology. We then apply the resulting performance model to analyze the scalability of several well-known deep convolutional neural networks as a function of the batch size, node floating-point throughput, node memory bandwidth, cluster dimension, and link bandwidth.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks\",\"authors\":\"Adrián Castelló, M. F. Dolz, E. S. Quintana‐Ortí, J. Duato\",\"doi\":\"10.1109/CCGRID.2019.00068\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We analyze the asymptotic performance of the training process of deep neural networks (NN) on clusters in order to determine the scalability. For this purpose, i) we assume a data parallel implementation of the training algorithm, which distributes the batches among the cluster nodes and replicates the model; ii) we leverage the roofline model to inspect the performance at the node level, taking into account the floating-point unit throughput and memory bandwidth; and iii) we consider distinct collective communication schemes that are optimal depending on the message size and underlying network interconnection topology. We then apply the resulting performance model to analyze the scalability of several well-known deep convolutional neural networks as a function of the batch size, node floating-point throughput, node memory bandwidth, cluster dimension, and link bandwidth.\",\"PeriodicalId\":234571,\"journal\":{\"name\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2019.00068\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

为了确定深度神经网络的可扩展性，我们分析了深度神经网络在聚类上训练过程的渐近性能。为此，i)我们假设训练算法的数据并行实现，它将批次分布在集群节点之间并复制模型;Ii)我们利用rooline模型在节点级别检查性能，考虑到浮点单位吞吐量和内存带宽;和iii)我们考虑不同的集体通信方案是最优的，这取决于消息大小和底层网络互连拓扑。然后，我们应用所得的性能模型来分析几个著名的深度卷积神经网络的可扩展性，作为批处理大小、节点浮点吞吐量、节点内存带宽、集群维度和链路带宽的函数。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Theoretical Scalability Analysis of Distributed Deep Convolutional Neural Networks

We analyze the asymptotic performance of the training process of deep neural networks (NN) on clusters in order to determine the scalability. For this purpose, i) we assume a data parallel implementation of the training algorithm, which distributes the batches among the cluster nodes and replicates the model; ii) we leverage the roofline model to inspect the performance at the node level, taking into account the floating-point unit throughput and memory bandwidth; and iii) we consider distinct collective communication schemes that are optimal depending on the message size and underlying network interconnection topology. We then apply the resulting performance model to analyze the scalability of several well-known deep convolutional neural networks as a function of the batch size, node floating-point throughput, node memory bandwidth, cluster dimension, and link bandwidth.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量