HCW 2022 Keynote Speaker: Heterogeneous Computing for Scientific Machine Learning

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI:10.1109/IPDPSW55747.2022.00011

L. White

{"title":"HCW 2022 Keynote Speaker: Heterogeneous Computing for Scientific Machine Learning","authors":"L. White","doi":"10.1109/IPDPSW55747.2022.00011","DOIUrl":null,"url":null,"abstract":"More than ever, the semiconductor industry is asked to answer society's call for more computing capacity and capability, which are driven by rapid digitalization, the widespread adoption of artificial intelligence, and the ever-increasing need for high-fidelity scientific simulations. While facing high demand, the supply of computing capability is being technically challenged by the slowdown of Moore's law and the need for high energy efficiency. This tug-of-war has now pushed the industry towards domain-specific accelerators, perhaps likely past the point of no return. The mix of general-purpose CPUs and high-end GPGPUs, which has pervaded data centers over the past few years, is likely to be expanded to a much richer set of application-specific accelerators, including AI engines, reconfigurable hardware, and even perhaps quantum, annealing, and neuromorphic devices. While acceleration and better efficiency may be enabled by using domain-specific accelerators for selected workloads, a much more holistic (i.e., system-wide) approach will have to be adopted to achieve significant performance gains for complex applications that consist of a variety of workloads where each could benefit from a specific accelerator. As an important example, scientific computing, which increasingly incorporates AI training and inference kernels in a tightly-integrated fashion, provides a rich and exciting laboratory for addressing the challenges of efficiently using highly-heterogeneous systems and for ultimately realizing their promises. Those challenges include co-designing the application, which requires domain experts to collaborate with other experts across the stack for workload mapping and data orchestration, and also adopting a decentralized strategy that embeds processing units where the data need them. Finally, the early experience of those co-design efforts should help the industry devise a longer-term strategy for developing programming models that would relieve application experts from what is often perceived as the burden of hardwareaware development and code optimization.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

More than ever, the semiconductor industry is asked to answer society's call for more computing capacity and capability, which are driven by rapid digitalization, the widespread adoption of artificial intelligence, and the ever-increasing need for high-fidelity scientific simulations. While facing high demand, the supply of computing capability is being technically challenged by the slowdown of Moore's law and the need for high energy efficiency. This tug-of-war has now pushed the industry towards domain-specific accelerators, perhaps likely past the point of no return. The mix of general-purpose CPUs and high-end GPGPUs, which has pervaded data centers over the past few years, is likely to be expanded to a much richer set of application-specific accelerators, including AI engines, reconfigurable hardware, and even perhaps quantum, annealing, and neuromorphic devices. While acceleration and better efficiency may be enabled by using domain-specific accelerators for selected workloads, a much more holistic (i.e., system-wide) approach will have to be adopted to achieve significant performance gains for complex applications that consist of a variety of workloads where each could benefit from a specific accelerator. As an important example, scientific computing, which increasingly incorporates AI training and inference kernels in a tightly-integrated fashion, provides a rich and exciting laboratory for addressing the challenges of efficiently using highly-heterogeneous systems and for ultimately realizing their promises. Those challenges include co-designing the application, which requires domain experts to collaborate with other experts across the stack for workload mapping and data orchestration, and also adopting a decentralized strategy that embeds processing units where the data need them. Finally, the early experience of those co-design efforts should help the industry devise a longer-term strategy for developing programming models that would relieve application experts from what is often perceived as the burden of hardwareaware development and code optimization.

查看原文本刊更多论文

HCW 2022主题演讲:面向科学机器学习的异构计算

半导体行业比以往任何时候都更需要回应社会对更多计算能力和能力的要求，这是由快速数字化、人工智能的广泛采用以及对高保真科学模拟日益增长的需求所驱动的。在面对高需求的同时，计算能力的供应正受到摩尔定律放缓和高能效需求的技术挑战。这种拉锯战现在已经将行业推向了特定领域的加速器，可能已经超过了不归路。通用cpu和高端gpgpu的混合，在过去几年里已经遍布数据中心，很可能会扩展到更丰富的应用特定加速器，包括人工智能引擎、可重构硬件，甚至可能是量子、退火和神经形态设备。虽然可以通过为选定的工作负载使用特定于领域的加速器来实现加速和更高的效率，但是必须采用更全面(即系统范围)的方法来实现由各种工作负载组成的复杂应用程序的显著性能提升，其中每个工作负载都可以从特定的加速器中受益。作为一个重要的例子，科学计算越来越多地以紧密集成的方式将人工智能训练和推理内核结合在一起，为解决高效使用高度异构系统的挑战并最终实现其承诺提供了丰富而令人兴奋的实验室。这些挑战包括共同设计应用程序，这需要领域专家与跨堆栈的其他专家协作进行工作负载映射和数据编排，还需要采用分散策略，将处理单元嵌入到数据需要的地方。最后，这些协同设计工作的早期经验应该有助于业界设计开发编程模型的长期策略，从而将应用程序专家从通常被认为是硬件开发和代码优化的负担中解脱出来。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量