{"title":"TPU、GPU、CPU在Google协作分布式深度学习上的性能比较","authors":"H. Kimm, Incheon Paik, Hanke Kimm","doi":"10.1109/MCSoC51149.2021.00053","DOIUrl":null,"url":null,"abstract":"Deep Learning models need massive amounts compute powers and tend to improve performance running on special purpose processors accelerators designed to speed up compute-intensive applications. The accelerators like Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are widely used as deep learning hardware platforms which can often achieve better performance than CPUs, with their massive parallel execution resources and high memory bandwidth. Google Colaboratory known as Colab is a cloud service based on Jupyter Notebook that allows the users to write and execute mostly Python in a browser and admits free access to TPUs and GPUs without extra configuration need, which are widely available cloud hardware platforms. In this paper, we present a through comparison of the hardware platforms on Google Colab that is benchmarked with Distributed Bidirectional Long Short-Term Memory (dBLSTM) models upon the number of layers, the number of units each layer, and the numbers of input and output units the datasets. Human Activity Recognition (HAR) data from UCI machine-learning library have been applied to the proposed distributed bidirectional LSTM model to find the performance, strengths, bottlenecks of the hardware platforms of TPU, GPU and CPU upon hyperparameters, execution time, and evaluation metrics: accuracy, precision, recall and F1 score.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Performance Comparision of TPU, GPU, CPU on Google Colaboratory Over Distributed Deep Learning\",\"authors\":\"H. Kimm, Incheon Paik, Hanke Kimm\",\"doi\":\"10.1109/MCSoC51149.2021.00053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Learning models need massive amounts compute powers and tend to improve performance running on special purpose processors accelerators designed to speed up compute-intensive applications. The accelerators like Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are widely used as deep learning hardware platforms which can often achieve better performance than CPUs, with their massive parallel execution resources and high memory bandwidth. Google Colaboratory known as Colab is a cloud service based on Jupyter Notebook that allows the users to write and execute mostly Python in a browser and admits free access to TPUs and GPUs without extra configuration need, which are widely available cloud hardware platforms. In this paper, we present a through comparison of the hardware platforms on Google Colab that is benchmarked with Distributed Bidirectional Long Short-Term Memory (dBLSTM) models upon the number of layers, the number of units each layer, and the numbers of input and output units the datasets. Human Activity Recognition (HAR) data from UCI machine-learning library have been applied to the proposed distributed bidirectional LSTM model to find the performance, strengths, bottlenecks of the hardware platforms of TPU, GPU and CPU upon hyperparameters, execution time, and evaluation metrics: accuracy, precision, recall and F1 score.\",\"PeriodicalId\":166811,\"journal\":{\"name\":\"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MCSoC51149.2021.00053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MCSoC51149.2021.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
摘要
深度学习模型需要大量的计算能力,并且倾向于提高运行在特殊用途处理器上的性能——加速器旨在加速计算密集型应用程序。Tensor Processing Units (tpu)和Graphics Processing Units (gpu)等加速器被广泛用作深度学习硬件平台,它们具有大量并行执行资源和高内存带宽,通常可以实现比cpu更好的性能。Google collaboration (Colab)是一项基于Jupyter Notebook的云服务,它允许用户在浏览器中编写和执行大部分Python,并允许免费访问tpu和gpu,而无需额外配置,这些都是广泛使用的云硬件平台。在本文中,我们对Google Colab上以分布式双向长短期记忆(dBLSTM)模型为基准的硬件平台在层数、每层单元数以及数据集的输入和输出单元数上进行了全面比较。将UCI机器学习库中的人类活动识别(HAR)数据应用于所提出的分布式双向LSTM模型,在超参数、执行时间和评估指标(准确率、精密度、召回率和F1分数)上找到TPU、GPU和CPU硬件平台的性能、优势和瓶颈。
Performance Comparision of TPU, GPU, CPU on Google Colaboratory Over Distributed Deep Learning
Deep Learning models need massive amounts compute powers and tend to improve performance running on special purpose processors accelerators designed to speed up compute-intensive applications. The accelerators like Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are widely used as deep learning hardware platforms which can often achieve better performance than CPUs, with their massive parallel execution resources and high memory bandwidth. Google Colaboratory known as Colab is a cloud service based on Jupyter Notebook that allows the users to write and execute mostly Python in a browser and admits free access to TPUs and GPUs without extra configuration need, which are widely available cloud hardware platforms. In this paper, we present a through comparison of the hardware platforms on Google Colab that is benchmarked with Distributed Bidirectional Long Short-Term Memory (dBLSTM) models upon the number of layers, the number of units each layer, and the numbers of input and output units the datasets. Human Activity Recognition (HAR) data from UCI machine-learning library have been applied to the proposed distributed bidirectional LSTM model to find the performance, strengths, bottlenecks of the hardware platforms of TPU, GPU and CPU upon hyperparameters, execution time, and evaluation metrics: accuracy, precision, recall and F1 score.