Performance Comparision of TPU, GPU, CPU on Google Colaboratory Over Distributed Deep Learning

H. Kimm, Incheon Paik, Hanke Kimm
{"title":"Performance Comparision of TPU, GPU, CPU on Google Colaboratory Over Distributed Deep Learning","authors":"H. Kimm, Incheon Paik, Hanke Kimm","doi":"10.1109/MCSoC51149.2021.00053","DOIUrl":null,"url":null,"abstract":"Deep Learning models need massive amounts compute powers and tend to improve performance running on special purpose processors accelerators designed to speed up compute-intensive applications. The accelerators like Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are widely used as deep learning hardware platforms which can often achieve better performance than CPUs, with their massive parallel execution resources and high memory bandwidth. Google Colaboratory known as Colab is a cloud service based on Jupyter Notebook that allows the users to write and execute mostly Python in a browser and admits free access to TPUs and GPUs without extra configuration need, which are widely available cloud hardware platforms. In this paper, we present a through comparison of the hardware platforms on Google Colab that is benchmarked with Distributed Bidirectional Long Short-Term Memory (dBLSTM) models upon the number of layers, the number of units each layer, and the numbers of input and output units the datasets. Human Activity Recognition (HAR) data from UCI machine-learning library have been applied to the proposed distributed bidirectional LSTM model to find the performance, strengths, bottlenecks of the hardware platforms of TPU, GPU and CPU upon hyperparameters, execution time, and evaluation metrics: accuracy, precision, recall and F1 score.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

Deep Learning models need massive amounts compute powers and tend to improve performance running on special purpose processors accelerators designed to speed up compute-intensive applications. The accelerators like Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are widely used as deep learning hardware platforms which can often achieve better performance than CPUs, with their massive parallel execution resources and high memory bandwidth. Google Colaboratory known as Colab is a cloud service based on Jupyter Notebook that allows the users to write and execute mostly Python in a browser and admits free access to TPUs and GPUs without extra configuration need, which are widely available cloud hardware platforms. In this paper, we present a through comparison of the hardware platforms on Google Colab that is benchmarked with Distributed Bidirectional Long Short-Term Memory (dBLSTM) models upon the number of layers, the number of units each layer, and the numbers of input and output units the datasets. Human Activity Recognition (HAR) data from UCI machine-learning library have been applied to the proposed distributed bidirectional LSTM model to find the performance, strengths, bottlenecks of the hardware platforms of TPU, GPU and CPU upon hyperparameters, execution time, and evaluation metrics: accuracy, precision, recall and F1 score.
TPU、GPU、CPU在Google协作分布式深度学习上的性能比较
深度学习模型需要大量的计算能力,并且倾向于提高运行在特殊用途处理器上的性能——加速器旨在加速计算密集型应用程序。Tensor Processing Units (tpu)和Graphics Processing Units (gpu)等加速器被广泛用作深度学习硬件平台,它们具有大量并行执行资源和高内存带宽,通常可以实现比cpu更好的性能。Google collaboration (Colab)是一项基于Jupyter Notebook的云服务,它允许用户在浏览器中编写和执行大部分Python,并允许免费访问tpu和gpu,而无需额外配置,这些都是广泛使用的云硬件平台。在本文中,我们对Google Colab上以分布式双向长短期记忆(dBLSTM)模型为基准的硬件平台在层数、每层单元数以及数据集的输入和输出单元数上进行了全面比较。将UCI机器学习库中的人类活动识别(HAR)数据应用于所提出的分布式双向LSTM模型,在超参数、执行时间和评估指标(准确率、精密度、召回率和F1分数)上找到TPU、GPU和CPU硬件平台的性能、优势和瓶颈。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信