信标:预测和利用动态循环特性的端到端编译器框架

IF 2.2 Q2 COMPUTER SCIENCE, SOFTWARE ENGINEERING
Girish Mururu, Sharjeel Khan, Bodhisatwa Chatterjee, Chao Chen, Chris Porter, Ada Gavrilovska, Santosh Pande
{"title":"信标:预测和利用动态循环特性的端到端编译器框架","authors":"Girish Mururu, Sharjeel Khan, Bodhisatwa Chatterjee, Chao Chen, Chris Porter, Ada Gavrilovska, Santosh Pande","doi":"10.1145/3622803","DOIUrl":null,"url":null,"abstract":"Efficient management of shared resources is a critical problem in high-performance computing (HPC) environments. Existing workload management systems often promote non-sharing of resources among different co-executing applications to achieve performance isolation. Such schemes lead to poor resource utilization and suboptimal process throughput, adversely affecting user productivity. Tackling this problem in a scalable fashion is extremely challenging, since it requires the workload scheduler to possess an in-depth knowledge about various application resource requirements and runtime phases at fine granularities within individual applications. In this work, we show that applications’ resource requirements and execution phase behaviour can be captured in a scalable and lightweight manner at runtime by estimating important program artifacts termed as “ dynamic loop characteristics ”. Specifically, we propose a solution to the problem of efficient workload scheduling by designing a compiler and runtime cooperative framework that leverages novel loop-based compiler analysis for resource allocation . We present Beacons Framework , an end-to-end compiler and scheduling framework, that estimates dynamic loop characteristics, encapsulates them in compiler-instrumented beacons in an application, and broadcasts them during application runtime, for proactive workload scheduling. We focus on estimating four important loop characteristics : loop trip-count , loop timing , loop memory footprint , and loop data-reuse behaviour , through a combination of compiler analysis and machine learning. The novelty of the Beacons Framework also lies in its ability to tackle irregular loops that exhibit complex control flow with indeterminate loop bounds involving structure fields, aliased variables and function calls , which are highly prevalent in modern workloads. At the backend, Beacons Framework entails a proactive workload scheduler that leverages the runtime information to orchestrate aggressive process co-locations, for maximizing resource concurrency, without causing cache thrashing . Our results show that Beacons Framework can predict different loop characteristics with an accuracy of 85% to 95% on average, and the proactive scheduler obtains an average throughput improvement of 1.9x (up to 3.2x ) over the state-of-the-art schedulers on an Amazon Graviton2 machine on consolidated workloads involving 1000-10000 co-executing processes, across 51 benchmarks.","PeriodicalId":20697,"journal":{"name":"Proceedings of the ACM on Programming Languages","volume":"36 1","pages":"0"},"PeriodicalIF":2.2000,"publicationDate":"2023-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop Characteristics\",\"authors\":\"Girish Mururu, Sharjeel Khan, Bodhisatwa Chatterjee, Chao Chen, Chris Porter, Ada Gavrilovska, Santosh Pande\",\"doi\":\"10.1145/3622803\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Efficient management of shared resources is a critical problem in high-performance computing (HPC) environments. Existing workload management systems often promote non-sharing of resources among different co-executing applications to achieve performance isolation. Such schemes lead to poor resource utilization and suboptimal process throughput, adversely affecting user productivity. Tackling this problem in a scalable fashion is extremely challenging, since it requires the workload scheduler to possess an in-depth knowledge about various application resource requirements and runtime phases at fine granularities within individual applications. In this work, we show that applications’ resource requirements and execution phase behaviour can be captured in a scalable and lightweight manner at runtime by estimating important program artifacts termed as “ dynamic loop characteristics ”. Specifically, we propose a solution to the problem of efficient workload scheduling by designing a compiler and runtime cooperative framework that leverages novel loop-based compiler analysis for resource allocation . We present Beacons Framework , an end-to-end compiler and scheduling framework, that estimates dynamic loop characteristics, encapsulates them in compiler-instrumented beacons in an application, and broadcasts them during application runtime, for proactive workload scheduling. We focus on estimating four important loop characteristics : loop trip-count , loop timing , loop memory footprint , and loop data-reuse behaviour , through a combination of compiler analysis and machine learning. The novelty of the Beacons Framework also lies in its ability to tackle irregular loops that exhibit complex control flow with indeterminate loop bounds involving structure fields, aliased variables and function calls , which are highly prevalent in modern workloads. At the backend, Beacons Framework entails a proactive workload scheduler that leverages the runtime information to orchestrate aggressive process co-locations, for maximizing resource concurrency, without causing cache thrashing . Our results show that Beacons Framework can predict different loop characteristics with an accuracy of 85% to 95% on average, and the proactive scheduler obtains an average throughput improvement of 1.9x (up to 3.2x ) over the state-of-the-art schedulers on an Amazon Graviton2 machine on consolidated workloads involving 1000-10000 co-executing processes, across 51 benchmarks.\",\"PeriodicalId\":20697,\"journal\":{\"name\":\"Proceedings of the ACM on Programming Languages\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-10-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Programming Languages\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3622803\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, SOFTWARE ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Programming Languages","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3622803","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, SOFTWARE ENGINEERING","Score":null,"Total":0}
引用次数: 0

摘要

共享资源的有效管理是高性能计算环境中的一个关键问题。现有的工作负载管理系统通常提倡在不同的协同执行应用程序之间不共享资源,以实现性能隔离。这样的方案导致资源利用率低下和次优流程吞吐量,对用户生产力产生不利影响。以可伸缩的方式解决这个问题是极具挑战性的,因为它要求工作负载调度器对各个应用程序中的各种应用程序资源需求和运行时阶段有深入的了解。在这项工作中,我们展示了应用程序的资源需求和执行阶段行为可以通过评估被称为“动态循环特征”的重要程序工件,在运行时以可伸缩和轻量级的方式捕获。具体来说,我们通过设计一个编译器和运行时协作框架来解决高效工作负载调度问题,该框架利用新颖的基于循环的编译器分析来进行资源分配。我们提出了Beacons Framework,这是一个端到端编译器和调度框架,它估计动态循环特征,将它们封装在应用程序中的编译器配置的信标中,并在应用程序运行时广播它们,以进行主动工作负载调度。通过编译器分析和机器学习的结合,我们专注于估计四个重要的循环特征:循环行程计数、循环定时、循环内存占用和循环数据重用行为。Beacons框架的新颖之处还在于它能够处理不规则循环,这些循环表现出复杂的控制流,包含不确定的循环边界,涉及结构字段、别名变量和函数调用,这在现代工作负载中非常普遍。在后端,Beacons Framework需要一个主动的工作负载调度器,该调度器利用运行时信息编排积极的进程共存,以最大化资源并发性,而不会导致缓存抖动。我们的结果表明,Beacons Framework可以预测不同的循环特征,平均准确率为85%至95%,并且在涉及1000-10000个协同执行进程的合并工作负载上,主动调度器在51个基准测试中,比Amazon Graviton2机器上最先进的调度器平均吞吐量提高1.9倍(最高可达3.2倍)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Beacons: An End-to-End Compiler Framework for Predicting and Utilizing Dynamic Loop Characteristics
Efficient management of shared resources is a critical problem in high-performance computing (HPC) environments. Existing workload management systems often promote non-sharing of resources among different co-executing applications to achieve performance isolation. Such schemes lead to poor resource utilization and suboptimal process throughput, adversely affecting user productivity. Tackling this problem in a scalable fashion is extremely challenging, since it requires the workload scheduler to possess an in-depth knowledge about various application resource requirements and runtime phases at fine granularities within individual applications. In this work, we show that applications’ resource requirements and execution phase behaviour can be captured in a scalable and lightweight manner at runtime by estimating important program artifacts termed as “ dynamic loop characteristics ”. Specifically, we propose a solution to the problem of efficient workload scheduling by designing a compiler and runtime cooperative framework that leverages novel loop-based compiler analysis for resource allocation . We present Beacons Framework , an end-to-end compiler and scheduling framework, that estimates dynamic loop characteristics, encapsulates them in compiler-instrumented beacons in an application, and broadcasts them during application runtime, for proactive workload scheduling. We focus on estimating four important loop characteristics : loop trip-count , loop timing , loop memory footprint , and loop data-reuse behaviour , through a combination of compiler analysis and machine learning. The novelty of the Beacons Framework also lies in its ability to tackle irregular loops that exhibit complex control flow with indeterminate loop bounds involving structure fields, aliased variables and function calls , which are highly prevalent in modern workloads. At the backend, Beacons Framework entails a proactive workload scheduler that leverages the runtime information to orchestrate aggressive process co-locations, for maximizing resource concurrency, without causing cache thrashing . Our results show that Beacons Framework can predict different loop characteristics with an accuracy of 85% to 95% on average, and the proactive scheduler obtains an average throughput improvement of 1.9x (up to 3.2x ) over the state-of-the-art schedulers on an Amazon Graviton2 machine on consolidated workloads involving 1000-10000 co-executing processes, across 51 benchmarks.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Proceedings of the ACM on Programming Languages
Proceedings of the ACM on Programming Languages Engineering-Safety, Risk, Reliability and Quality
CiteScore
5.20
自引率
22.20%
发文量
192
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信