面向研究的深度神经网络分布式训练灵活框架

S. Barrachina, Adrián Castelló, M. Catalán, M. F. Dolz, José I. Mestre
{"title":"面向研究的深度神经网络分布式训练灵活框架","authors":"S. Barrachina, Adrián Castelló, M. Catalán, M. F. Dolz, José I. Mestre","doi":"10.1109/IPDPSW52791.2021.00110","DOIUrl":null,"url":null,"abstract":"We present PyDTNN, a framework for training deep neural networks (DNNs) on clusters of computers that has been designed as a research-oriented tool with a low learning curve. Our parallel training framework offers a set of functionalities that cover several must-have features for advanced deep learning (DL) software: 1) it is developed in Python in order to expose an accessible entry point for the newcomer; 2) it is extensible, allowing users to prototype new research ideas without requiring them to deal with complex software-stacks; and 3) it delivers high parallel performance, exploiting MPI via mpi4py/NCCL for communication; and NumPy, cuDNN, and cuBLAS for computation.This paper provides practical evidence that PyDTNN attains similar accuracy and parallel performance to those exhibited by Google’s TensorFlow (TF), though we recognize that PyDTNN cannot compete with a production-level framework such as TF or PyTorch in terms of maturity and functionality. Instead, PyDTNN is designed as an accessible and customizable tool for prototyping ideas related to distributed training of DNN models on clusters.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks\",\"authors\":\"S. Barrachina, Adrián Castelló, M. Catalán, M. F. Dolz, José I. Mestre\",\"doi\":\"10.1109/IPDPSW52791.2021.00110\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present PyDTNN, a framework for training deep neural networks (DNNs) on clusters of computers that has been designed as a research-oriented tool with a low learning curve. Our parallel training framework offers a set of functionalities that cover several must-have features for advanced deep learning (DL) software: 1) it is developed in Python in order to expose an accessible entry point for the newcomer; 2) it is extensible, allowing users to prototype new research ideas without requiring them to deal with complex software-stacks; and 3) it delivers high parallel performance, exploiting MPI via mpi4py/NCCL for communication; and NumPy, cuDNN, and cuBLAS for computation.This paper provides practical evidence that PyDTNN attains similar accuracy and parallel performance to those exhibited by Google’s TensorFlow (TF), though we recognize that PyDTNN cannot compete with a production-level framework such as TF or PyTorch in terms of maturity and functionality. Instead, PyDTNN is designed as an accessible and customizable tool for prototyping ideas related to distributed training of DNN models on clusters.\",\"PeriodicalId\":170832,\"journal\":{\"name\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"51 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW52791.2021.00110\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00110","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

我们提出PyDTNN,一个用于在计算机集群上训练深度神经网络(dnn)的框架,它被设计为具有低学习曲线的研究型工具。我们的并行训练框架提供了一组功能,涵盖了高级深度学习(DL)软件的几个必备功能:1)它是用Python开发的,以便为新手提供一个可访问的入口点;2)它是可扩展的,允许用户在不需要处理复杂的软件堆栈的情况下创建新的研究想法的原型;3)提供高并行性能,通过mpi4py/NCCL利用MPI进行通信;以及NumPy、cuDNN和cuBLAS进行计算。本文提供了实际证据,证明PyDTNN获得了与Google的TensorFlow (TF)相似的精度和并行性能,尽管我们认识到PyDTNN在成熟度和功能方面无法与TF或PyTorch等生产级框架竞争。相反,PyDTNN被设计成一个可访问和可定制的工具,用于在集群上对DNN模型进行分布式训练。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks
We present PyDTNN, a framework for training deep neural networks (DNNs) on clusters of computers that has been designed as a research-oriented tool with a low learning curve. Our parallel training framework offers a set of functionalities that cover several must-have features for advanced deep learning (DL) software: 1) it is developed in Python in order to expose an accessible entry point for the newcomer; 2) it is extensible, allowing users to prototype new research ideas without requiring them to deal with complex software-stacks; and 3) it delivers high parallel performance, exploiting MPI via mpi4py/NCCL for communication; and NumPy, cuDNN, and cuBLAS for computation.This paper provides practical evidence that PyDTNN attains similar accuracy and parallel performance to those exhibited by Google’s TensorFlow (TF), though we recognize that PyDTNN cannot compete with a production-level framework such as TF or PyTorch in terms of maturity and functionality. Instead, PyDTNN is designed as an accessible and customizable tool for prototyping ideas related to distributed training of DNN models on clusters.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信