人工智能处理器架构寒武纪大爆发中的适者生存:特邀论文

S. Sukumar, J. Balma, Cong Xu, S. Serebryakov
{"title":"人工智能处理器架构寒武纪大爆发中的适者生存:特邀论文","authors":"S. Sukumar, J. Balma, Cong Xu, S. Serebryakov","doi":"10.1109/PEHC54839.2021.00010","DOIUrl":null,"url":null,"abstract":"The need for high performance computing in data-driven artificial intelligence (AI) workloads has led to the Cambrian explosion of processor architectures. As these novel processor architectures aim to evolve and thrive inside datacenters and cloud-services, we need to understand different figures-of-merit for device-, server- and rack-scale systems. Towards that goal, we share early-access hands-on experience with these processor/accelerator architectures. We describe an evaluation plan that includes carefully chosen neural network models to gauge the maturity of the hardware and software ecosystem. Our hands-on evaluation using benchmarks reveals significant benefits of hardware acceleration while exposing several blind spots in the software ecosystem. Ranking the benefits based on different figures of merit such as cost, energy, and adoption efficiency reveals a \"heterogenous\" future for production systems with multiple processor architectures in the edge-to-datacenter AI workflow.Preparing to survive in this heterogeneous future, we describe a method to profile and predict the performance benefits of a deep learning training workload on novel architectures. Our approach profiles the neural network model for memory, bandwidth and compute requirements by analyzing the model definition. Then, using profiling tools, we estimate the I/O and arithmetic intensity requirements at different batch sizes. By overlaying profiler results onto analytic roofline models of the emerging processor architectures, we identify opportunities for potential acceleration. We discuss how the interpretation of the roofline analysis can guide system architecture to deliver productive performance and conclude with recommendations to survive the Cambrian explosion.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper\",\"authors\":\"S. Sukumar, J. Balma, Cong Xu, S. Serebryakov\",\"doi\":\"10.1109/PEHC54839.2021.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The need for high performance computing in data-driven artificial intelligence (AI) workloads has led to the Cambrian explosion of processor architectures. As these novel processor architectures aim to evolve and thrive inside datacenters and cloud-services, we need to understand different figures-of-merit for device-, server- and rack-scale systems. Towards that goal, we share early-access hands-on experience with these processor/accelerator architectures. We describe an evaluation plan that includes carefully chosen neural network models to gauge the maturity of the hardware and software ecosystem. Our hands-on evaluation using benchmarks reveals significant benefits of hardware acceleration while exposing several blind spots in the software ecosystem. Ranking the benefits based on different figures of merit such as cost, energy, and adoption efficiency reveals a \\\"heterogenous\\\" future for production systems with multiple processor architectures in the edge-to-datacenter AI workflow.Preparing to survive in this heterogeneous future, we describe a method to profile and predict the performance benefits of a deep learning training workload on novel architectures. Our approach profiles the neural network model for memory, bandwidth and compute requirements by analyzing the model definition. Then, using profiling tools, we estimate the I/O and arithmetic intensity requirements at different batch sizes. By overlaying profiler results onto analytic roofline models of the emerging processor architectures, we identify opportunities for potential acceleration. We discuss how the interpretation of the roofline analysis can guide system architecture to deliver productive performance and conclude with recommendations to survive the Cambrian explosion.\",\"PeriodicalId\":147071,\"journal\":{\"name\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PEHC54839.2021.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PEHC54839.2021.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

数据驱动的人工智能(AI)工作负载对高性能计算的需求导致了处理器架构的寒武纪大爆发。由于这些新型处理器架构的目标是在数据中心和云服务中发展壮大,我们需要了解设备级、服务器级和机架级系统的不同性能指标。为了实现这一目标,我们分享了这些处理器/加速器架构的早期实践经验。我们描述了一个评估计划,其中包括精心选择的神经网络模型,以衡量硬件和软件生态系统的成熟度。我们使用基准测试进行的实际评估揭示了硬件加速的显著好处,同时暴露了软件生态系统中的几个盲点。根据成本、能源和采用效率等不同的优点对优势进行排名,揭示了在边缘到数据中心的人工智能工作流中具有多个处理器架构的生产系统的“异构”未来。为了在这个异构的未来中生存下来,我们描述了一种方法来分析和预测在新架构上深度学习训练工作负载的性能优势。我们的方法通过分析模型定义来描述神经网络模型的内存、带宽和计算需求。然后,使用分析工具,我们估计了不同批处理大小下的I/O和算术强度需求。通过将分析器结果叠加到新兴处理器架构的分析屋顶线模型上,我们确定了潜在加速的机会。我们讨论了对顶线分析的解释如何指导系统架构以提供生产性能,并总结了在寒武纪大爆发中生存下来的建议。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper
The need for high performance computing in data-driven artificial intelligence (AI) workloads has led to the Cambrian explosion of processor architectures. As these novel processor architectures aim to evolve and thrive inside datacenters and cloud-services, we need to understand different figures-of-merit for device-, server- and rack-scale systems. Towards that goal, we share early-access hands-on experience with these processor/accelerator architectures. We describe an evaluation plan that includes carefully chosen neural network models to gauge the maturity of the hardware and software ecosystem. Our hands-on evaluation using benchmarks reveals significant benefits of hardware acceleration while exposing several blind spots in the software ecosystem. Ranking the benefits based on different figures of merit such as cost, energy, and adoption efficiency reveals a "heterogenous" future for production systems with multiple processor architectures in the edge-to-datacenter AI workflow.Preparing to survive in this heterogeneous future, we describe a method to profile and predict the performance benefits of a deep learning training workload on novel architectures. Our approach profiles the neural network model for memory, bandwidth and compute requirements by analyzing the model definition. Then, using profiling tools, we estimate the I/O and arithmetic intensity requirements at different batch sizes. By overlaying profiler results onto analytic roofline models of the emerging processor architectures, we identify opportunities for potential acceleration. We discuss how the interpretation of the roofline analysis can guide system architecture to deliver productive performance and conclude with recommendations to survive the Cambrian explosion.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信