基于TensorFlow的Intel至强架构分布式MLPerf ResNet50训练

Wei Wang, N. Hasabnis
{"title":"基于TensorFlow的Intel至强架构分布式MLPerf ResNet50训练","authors":"Wei Wang, N. Hasabnis","doi":"10.1145/3440722.3440880","DOIUrl":null,"url":null,"abstract":"MLPerf benchmarks, which measure training and inference performance of ML hardware and software, have published three sets of ML training results so far. In all sets of results, ResNet50v1.5 was used as a standard benchmark to showcase the latest developments on image recognition tasks. The latest MLPerf training round (v0.7) featured Intel’s submission with TensorFlow. In this paper, we describe the recent optimization work that enabled this submission. In particular, we enabled BFloat16 data type in ResNet50v1.5 model as well as in Intel-optimized TensorFlow to exploit full potential of 3rd generation Intel Xeon scalable processors that have built-in BFloat16 support. We also describe the performance optimizations as well as the state-of-the-art accuracy/convergence results of ResNet50v1.5 model, achieved with large-scale distributed training (with upto 256 MPI workers) with Horovod. These results lay great foundation to support future MLPerf training submissions with large scale Intel Xeon clusters.","PeriodicalId":183674,"journal":{"name":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Distributed MLPerf ResNet50 Training on Intel Xeon Architectures with TensorFlow\",\"authors\":\"Wei Wang, N. Hasabnis\",\"doi\":\"10.1145/3440722.3440880\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"MLPerf benchmarks, which measure training and inference performance of ML hardware and software, have published three sets of ML training results so far. In all sets of results, ResNet50v1.5 was used as a standard benchmark to showcase the latest developments on image recognition tasks. The latest MLPerf training round (v0.7) featured Intel’s submission with TensorFlow. In this paper, we describe the recent optimization work that enabled this submission. In particular, we enabled BFloat16 data type in ResNet50v1.5 model as well as in Intel-optimized TensorFlow to exploit full potential of 3rd generation Intel Xeon scalable processors that have built-in BFloat16 support. We also describe the performance optimizations as well as the state-of-the-art accuracy/convergence results of ResNet50v1.5 model, achieved with large-scale distributed training (with upto 256 MPI workers) with Horovod. These results lay great foundation to support future MLPerf training submissions with large scale Intel Xeon clusters.\",\"PeriodicalId\":183674,\"journal\":{\"name\":\"The International Conference on High Performance Computing in Asia-Pacific Region Companion\",\"volume\":\"105 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The International Conference on High Performance Computing in Asia-Pacific Region Companion\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3440722.3440880\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The International Conference on High Performance Computing in Asia-Pacific Region Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3440722.3440880","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

MLPerf基准测试衡量机器学习硬件和软件的训练和推理性能,迄今为止已经发布了三组机器学习训练结果。在所有结果集中,ResNet50v1.5被用作标准基准,以展示图像识别任务的最新发展。最新的MLPerf培训轮(v0.7)以英特尔提交的TensorFlow为特色。在本文中,我们描述了支持此提交的最新优化工作。特别是,我们在ResNet50v1.5模型以及英特尔优化的TensorFlow中启用了BFloat16数据类型,以充分利用内置BFloat16支持的第三代英特尔至强可扩展处理器的全部潜力。我们还描述了ResNet50v1.5模型的性能优化以及最先进的精度/收敛结果,这些结果是通过Horovod的大规模分布式训练(多达256个MPI工人)实现的。这些结果为支持未来大规模Intel至强集群的MLPerf培训提交奠定了良好的基础。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Distributed MLPerf ResNet50 Training on Intel Xeon Architectures with TensorFlow
MLPerf benchmarks, which measure training and inference performance of ML hardware and software, have published three sets of ML training results so far. In all sets of results, ResNet50v1.5 was used as a standard benchmark to showcase the latest developments on image recognition tasks. The latest MLPerf training round (v0.7) featured Intel’s submission with TensorFlow. In this paper, we describe the recent optimization work that enabled this submission. In particular, we enabled BFloat16 data type in ResNet50v1.5 model as well as in Intel-optimized TensorFlow to exploit full potential of 3rd generation Intel Xeon scalable processors that have built-in BFloat16 support. We also describe the performance optimizations as well as the state-of-the-art accuracy/convergence results of ResNet50v1.5 model, achieved with large-scale distributed training (with upto 256 MPI workers) with Horovod. These results lay great foundation to support future MLPerf training submissions with large scale Intel Xeon clusters.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信