Agile Optimization Framework: A framework for tensor operator optimization in neural network

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2024-07-16 DOI:10.1016/j.future.2024.07.019

{"title":"Agile Optimization Framework: A framework for tensor operator optimization in neural network","authors":"","doi":"10.1016/j.future.2024.07.019","DOIUrl":null,"url":null,"abstract":"<div><p>In recent years, with the gradual slowing of Moore’s Law and the development of deep learning, the demand for hardware performance of executing deep learning based applications has significantly increased. In this case, deep learning compilers have been proven to maximize hardware performance while keeping computational power constant, especially the end-to-end compiler Tensor Virtual Machine (TVM). TVM optimizes tensors by finding excellent parallel computing schemes, thereby achieving the goal of improving the performance of neural network inference. However, there is still untapped potential in current optimization methods. However, existing optimization methods based on the TVM, such as Genetic Algorithms Tuner (GA-Tuner), have failed to achieve a balance between optimization performance and optimization time. The intolerable duration of optimization detracts from TVM’s usability, rendering it challenging to extend into the scientific community. This paper introduces a novel deep learning compilation optimization framework base on TVM called Agile Optimization Framework (AOF), which incorporates a tuner based on the latest Beluga Whale Optimization Algorithm (BWO). The BWO is adept at tackling complex problems characterized by numerous local optima, making it particularly suitable for hardware compilation optimization scenarios. We further propose an Evolving Epsilon Strategy (EES), a search strategy that adaptively adjusts the balance between exploration and exploitation, thereby enhancing the effectiveness of the algorithm. Additionally, we developed a supervised Tuning Accelerator (TA) aimed at reducing the time required for optimization and enhancing efficiency. Comparative experiments demonstrate that AOF achieves 11.36%–66.20% improvement in performance and 30.30%–54.60% reduction in optimization time, significantly outperforming the control group.</p></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":null,"pages":null},"PeriodicalIF":6.2000,"publicationDate":"2024-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24003856","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

In recent years, with the gradual slowing of Moore’s Law and the development of deep learning, the demand for hardware performance of executing deep learning based applications has significantly increased. In this case, deep learning compilers have been proven to maximize hardware performance while keeping computational power constant, especially the end-to-end compiler Tensor Virtual Machine (TVM). TVM optimizes tensors by finding excellent parallel computing schemes, thereby achieving the goal of improving the performance of neural network inference. However, there is still untapped potential in current optimization methods. However, existing optimization methods based on the TVM, such as Genetic Algorithms Tuner (GA-Tuner), have failed to achieve a balance between optimization performance and optimization time. The intolerable duration of optimization detracts from TVM’s usability, rendering it challenging to extend into the scientific community. This paper introduces a novel deep learning compilation optimization framework base on TVM called Agile Optimization Framework (AOF), which incorporates a tuner based on the latest Beluga Whale Optimization Algorithm (BWO). The BWO is adept at tackling complex problems characterized by numerous local optima, making it particularly suitable for hardware compilation optimization scenarios. We further propose an Evolving Epsilon Strategy (EES), a search strategy that adaptively adjusts the balance between exploration and exploitation, thereby enhancing the effectiveness of the algorithm. Additionally, we developed a supervised Tuning Accelerator (TA) aimed at reducing the time required for optimization and enhancing efficiency. Comparative experiments demonstrate that AOF achieves 11.36%–66.20% improvement in performance and 30.30%–54.60% reduction in optimization time, significantly outperforming the control group.

查看原文本刊更多论文

敏捷优化框架：神经网络中的张量算子优化框架

近年来，随着摩尔定律的逐步放缓和深度学习的发展，基于深度学习的应用对硬件执行性能的要求显著提高。在这种情况下，深度学习编译器被证明可以在保持计算能力不变的情况下最大限度地提高硬件性能，尤其是端到端编译器张量虚拟机（TVM）。TVM 通过寻找优秀的并行计算方案来优化张量，从而实现提高神经网络推理性能的目标。然而，目前的优化方法仍有未开发的潜力。然而，现有的基于 TVM 的优化方法，如遗传算法调谐器（GA-Tuner），未能在优化性能和优化时间之间取得平衡。令人难以忍受的优化时间降低了 TVM 的可用性，使其在科学界的推广面临挑战。本文介绍了一种基于 TVM 的新型深度学习编译优化框架，名为 "敏捷优化框架"（Agile Optimization Framework，AOF），其中包含一个基于最新白鲸优化算法（Beluga Whale Optimization Algorithm，BWO）的调谐器。BWO 擅长处理以众多局部最优为特征的复杂问题，因此特别适用于硬件编译优化场景。我们进一步提出了 "进化伊普西隆策略"（EES），这是一种搜索策略，能自适应地调整探索与开发之间的平衡，从而提高算法的有效性。此外，我们还开发了一种有监督的调整加速器（TA），旨在减少优化所需的时间并提高效率。对比实验表明，AOF 的性能提高了 11.36%-66.20%，优化时间缩短了 30.30%-54.60%，明显优于对照组。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.