NEST-C:用于带有人工智能加速器的异构计算系统的深度学习编译器框架

IF 1.3 4区 计算机科学 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Jeman Park, Misun Yu, Jinse Kwon, Junmo Park, Jemin Lee, Yongin Kwon
{"title":"NEST-C:用于带有人工智能加速器的异构计算系统的深度学习编译器框架","authors":"Jeman Park,&nbsp;Misun Yu,&nbsp;Jinse Kwon,&nbsp;Junmo Park,&nbsp;Jemin Lee,&nbsp;Yongin Kwon","doi":"10.4218/etrij.2024-0139","DOIUrl":null,"url":null,"abstract":"<p>Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.</p>","PeriodicalId":11901,"journal":{"name":"ETRI Journal","volume":"46 5","pages":"851-864"},"PeriodicalIF":1.3000,"publicationDate":"2024-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2024-0139","citationCount":"0","resultStr":"{\"title\":\"NEST-C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators\",\"authors\":\"Jeman Park,&nbsp;Misun Yu,&nbsp;Jinse Kwon,&nbsp;Junmo Park,&nbsp;Jemin Lee,&nbsp;Yongin Kwon\",\"doi\":\"10.4218/etrij.2024-0139\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.</p>\",\"PeriodicalId\":11901,\"journal\":{\"name\":\"ETRI Journal\",\"volume\":\"46 5\",\"pages\":\"851-864\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2024-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.4218/etrij.2024-0139\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ETRI Journal\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2024-0139\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ETRI Journal","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.4218/etrij.2024-0139","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

摘要

深度学习(DL)极大地推动了人工智能(AI)的发展;然而,PyTorch、ONNX 和 TensorFlow 等框架是针对通用 GPU 优化的,导致神经处理单元(NPU)和内存处理(PIM)设备等专用加速器的效率低下。这些加速器旨在优化吞吐量和能效,但它们需要更有针对性的优化。为了解决这些局限性,我们提出了 NEST 编译器(NEST-C),这是一个新颖的 DL 框架,可改善模型在各种人工智能加速器上的部署和性能。NEST-C 利用基于剖析的量化、动态图分割和多级中间表示(IR)集成,在不同的硬件平台上高效执行。我们的研究结果表明,NEST-C 显著提高了各种人工智能加速器的计算效率和适应性,实现了更高的吞吐量、更低的延迟、更高的资源利用率和更强的模型可移植性。这些优势有助于在现代人工智能应用中更高效地部署 DL 模型。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

NEST-C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators

NEST-C: A deep learning compiler framework for heterogeneous computing systems with artificial intelligence accelerators

Deep learning (DL) has significantly advanced artificial intelligence (AI); however, frameworks such as PyTorch, ONNX, and TensorFlow are optimized for general-purpose GPUs, leading to inefficiencies on specialized accelerators such as neural processing units (NPUs) and processing-in-memory (PIM) devices. These accelerators are designed to optimize both throughput and energy efficiency but they require more tailored optimizations. To address these limitations, we propose the NEST compiler (NEST-C), a novel DL framework that improves the deployment and performance of models across various AI accelerators. NEST-C leverages profiling-based quantization, dynamic graph partitioning, and multi-level intermediate representation (IR) integration for efficient execution on diverse hardware platforms. Our results show that NEST-C significantly enhances computational efficiency and adaptability across various AI accelerators, achieving higher throughput, lower latency, improved resource utilization, and greater model portability. These benefits contribute to more efficient DL model deployment in modern AI applications.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ETRI Journal
ETRI Journal 工程技术-电信学
CiteScore
4.00
自引率
7.10%
发文量
98
审稿时长
6.9 months
期刊介绍: ETRI Journal is an international, peer-reviewed multidisciplinary journal published bimonthly in English. The main focus of the journal is to provide an open forum to exchange innovative ideas and technology in the fields of information, telecommunications, and electronics. Key topics of interest include high-performance computing, big data analytics, cloud computing, multimedia technology, communication networks and services, wireless communications and mobile computing, material and component technology, as well as security. With an international editorial committee and experts from around the world as reviewers, ETRI Journal publishes high-quality research papers on the latest and best developments from the global community.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信