{"title":"人工智能处理器架构寒武纪大爆发中的适者生存:特邀论文","authors":"S. Sukumar, J. Balma, Cong Xu, S. Serebryakov","doi":"10.1109/PEHC54839.2021.00010","DOIUrl":null,"url":null,"abstract":"The need for high performance computing in data-driven artificial intelligence (AI) workloads has led to the Cambrian explosion of processor architectures. As these novel processor architectures aim to evolve and thrive inside datacenters and cloud-services, we need to understand different figures-of-merit for device-, server- and rack-scale systems. Towards that goal, we share early-access hands-on experience with these processor/accelerator architectures. We describe an evaluation plan that includes carefully chosen neural network models to gauge the maturity of the hardware and software ecosystem. Our hands-on evaluation using benchmarks reveals significant benefits of hardware acceleration while exposing several blind spots in the software ecosystem. Ranking the benefits based on different figures of merit such as cost, energy, and adoption efficiency reveals a \"heterogenous\" future for production systems with multiple processor architectures in the edge-to-datacenter AI workflow.Preparing to survive in this heterogeneous future, we describe a method to profile and predict the performance benefits of a deep learning training workload on novel architectures. Our approach profiles the neural network model for memory, bandwidth and compute requirements by analyzing the model definition. Then, using profiling tools, we estimate the I/O and arithmetic intensity requirements at different batch sizes. By overlaying profiler results onto analytic roofline models of the emerging processor architectures, we identify opportunities for potential acceleration. We discuss how the interpretation of the roofline analysis can guide system architecture to deliver productive performance and conclude with recommendations to survive the Cambrian explosion.","PeriodicalId":147071,"journal":{"name":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper\",\"authors\":\"S. Sukumar, J. Balma, Cong Xu, S. Serebryakov\",\"doi\":\"10.1109/PEHC54839.2021.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The need for high performance computing in data-driven artificial intelligence (AI) workloads has led to the Cambrian explosion of processor architectures. As these novel processor architectures aim to evolve and thrive inside datacenters and cloud-services, we need to understand different figures-of-merit for device-, server- and rack-scale systems. Towards that goal, we share early-access hands-on experience with these processor/accelerator architectures. We describe an evaluation plan that includes carefully chosen neural network models to gauge the maturity of the hardware and software ecosystem. Our hands-on evaluation using benchmarks reveals significant benefits of hardware acceleration while exposing several blind spots in the software ecosystem. Ranking the benefits based on different figures of merit such as cost, energy, and adoption efficiency reveals a \\\"heterogenous\\\" future for production systems with multiple processor architectures in the edge-to-datacenter AI workflow.Preparing to survive in this heterogeneous future, we describe a method to profile and predict the performance benefits of a deep learning training workload on novel architectures. Our approach profiles the neural network model for memory, bandwidth and compute requirements by analyzing the model definition. Then, using profiling tools, we estimate the I/O and arithmetic intensity requirements at different batch sizes. By overlaying profiler results onto analytic roofline models of the emerging processor architectures, we identify opportunities for potential acceleration. We discuss how the interpretation of the roofline analysis can guide system architecture to deliver productive performance and conclude with recommendations to survive the Cambrian explosion.\",\"PeriodicalId\":147071,\"journal\":{\"name\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PEHC54839.2021.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE/ACM Programming Environments for Heterogeneous Computing (PEHC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PEHC54839.2021.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Survival of the Fittest Amidst the Cambrian Explosion of Processor Architectures for Artificial Intelligence : Invited Paper
The need for high performance computing in data-driven artificial intelligence (AI) workloads has led to the Cambrian explosion of processor architectures. As these novel processor architectures aim to evolve and thrive inside datacenters and cloud-services, we need to understand different figures-of-merit for device-, server- and rack-scale systems. Towards that goal, we share early-access hands-on experience with these processor/accelerator architectures. We describe an evaluation plan that includes carefully chosen neural network models to gauge the maturity of the hardware and software ecosystem. Our hands-on evaluation using benchmarks reveals significant benefits of hardware acceleration while exposing several blind spots in the software ecosystem. Ranking the benefits based on different figures of merit such as cost, energy, and adoption efficiency reveals a "heterogenous" future for production systems with multiple processor architectures in the edge-to-datacenter AI workflow.Preparing to survive in this heterogeneous future, we describe a method to profile and predict the performance benefits of a deep learning training workload on novel architectures. Our approach profiles the neural network model for memory, bandwidth and compute requirements by analyzing the model definition. Then, using profiling tools, we estimate the I/O and arithmetic intensity requirements at different batch sizes. By overlaying profiler results onto analytic roofline models of the emerging processor architectures, we identify opportunities for potential acceleration. We discuss how the interpretation of the roofline analysis can guide system architecture to deliver productive performance and conclude with recommendations to survive the Cambrian explosion.