SLITS:稀疏化智能线程调度

Proceedings of the ACM on Measurement and Analysis of Computing Systems Pub Date : 2023-02-27 DOI:10.1145/3579436

Wangkai Jin, Xiangjun Peng

{"title":"SLITS:稀疏化智能线程调度","authors":"Wangkai Jin, Xiangjun Peng","doi":"10.1145/3579436","DOIUrl":null,"url":null,"abstract":"A diverse set of scheduling objectives (e.g., resource contention, fairness, priority, etc.) breed a series of objective-specific schedulers for multi-core architectures. Existing designs incorporate thread-to-thread statistics at runtime, and schedule threads based on such an abstraction (we formalize thread-to-thread interaction as the Thread-Interaction Matrix). However, such an abstraction also reveals a consistently-overlooked issue: the Thread-Interaction Matrix (TIM) is highly sparse. Therefore, existing designs can only deliver sub-optimal decisions, since the sparsity issue limits the amount of thread permutations (and its statistics) to be exploited when performing scheduling decisions. We introduce Sparsity-Lightened Intelligent Thread Scheduling (SLITS), a general scheduler design for mitigating the sparsity issue of TIM, with the customizability for different scheduling objectives. SLITS is designed upon the key insight that: the sparsity issue of the TIM can be effectively mitigated via advanced Machine Learning (ML) techniques. SLITS has three components. First, SLITS profiles Thread Interactions for only a small number of thread permutations, and form the TIM using the run-time statistics. Second, SLITS estimates the missing values in the TIM using Factorization Machine (FM), a novel ML technique that can fill in the missing values within a large-scale sparse matrix based on the limited information. Third, SLITS leverages Lazy Reschedule, a general mechanism as the building block for customizing different scheduling policies for different scheduling objectives. We show how SLITS can be (1) customized for different scheduling objectives, including resource contention and fairness; and (2) implemented with only negligible hardware costs. We also discuss how SLITS can be potentially applied to other contexts of thread scheduling. We evaluate two SLITS variants against four state-of-the-art scheduler designs. We highlight that, averaged across 11 benchmarks, SLITS achieves an average speedup of 1.08X over the de facto standard for thread scheduler - the Completely Fair Scheduler, under the 16-core setting for a variety of number of threads (i.e., 32, 64 and 128). Our analysis reveals that the benefits of SLITS are credited to significant improvements of cache utilization. In addition, our experimental results confirm that SLITS is scalable and the benefits are robust when of the number of threads increases. We also perform extensive studies to (1) break down SLITS components to justify the synergy of our design choices, (2) examine the impacts of varying the estimation coverage of FM, (3) justify the necessity of Lazy Reschedule rather than periodic rescheduling, and (4) demonstrate the hardware overheads for SLITS implementations can be marginal (<1% chip area and power).","PeriodicalId":426760,"journal":{"name":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"SLITS: Sparsity-Lightened Intelligent Thread Scheduling\",\"authors\":\"Wangkai Jin, Xiangjun Peng\",\"doi\":\"10.1145/3579436\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A diverse set of scheduling objectives (e.g., resource contention, fairness, priority, etc.) breed a series of objective-specific schedulers for multi-core architectures. Existing designs incorporate thread-to-thread statistics at runtime, and schedule threads based on such an abstraction (we formalize thread-to-thread interaction as the Thread-Interaction Matrix). However, such an abstraction also reveals a consistently-overlooked issue: the Thread-Interaction Matrix (TIM) is highly sparse. Therefore, existing designs can only deliver sub-optimal decisions, since the sparsity issue limits the amount of thread permutations (and its statistics) to be exploited when performing scheduling decisions. We introduce Sparsity-Lightened Intelligent Thread Scheduling (SLITS), a general scheduler design for mitigating the sparsity issue of TIM, with the customizability for different scheduling objectives. SLITS is designed upon the key insight that: the sparsity issue of the TIM can be effectively mitigated via advanced Machine Learning (ML) techniques. SLITS has three components. First, SLITS profiles Thread Interactions for only a small number of thread permutations, and form the TIM using the run-time statistics. Second, SLITS estimates the missing values in the TIM using Factorization Machine (FM), a novel ML technique that can fill in the missing values within a large-scale sparse matrix based on the limited information. Third, SLITS leverages Lazy Reschedule, a general mechanism as the building block for customizing different scheduling policies for different scheduling objectives. We show how SLITS can be (1) customized for different scheduling objectives, including resource contention and fairness; and (2) implemented with only negligible hardware costs. We also discuss how SLITS can be potentially applied to other contexts of thread scheduling. We evaluate two SLITS variants against four state-of-the-art scheduler designs. We highlight that, averaged across 11 benchmarks, SLITS achieves an average speedup of 1.08X over the de facto standard for thread scheduler - the Completely Fair Scheduler, under the 16-core setting for a variety of number of threads (i.e., 32, 64 and 128). Our analysis reveals that the benefits of SLITS are credited to significant improvements of cache utilization. In addition, our experimental results confirm that SLITS is scalable and the benefits are robust when of the number of threads increases. We also perform extensive studies to (1) break down SLITS components to justify the synergy of our design choices, (2) examine the impacts of varying the estimation coverage of FM, (3) justify the necessity of Lazy Reschedule rather than periodic rescheduling, and (4) demonstrate the hardware overheads for SLITS implementations can be marginal (<1% chip area and power).\",\"PeriodicalId\":426760,\"journal\":{\"name\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM on Measurement and Analysis of Computing Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579436\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM on Measurement and Analysis of Computing Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579436","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

一组不同的调度目标(例如，资源争用、公平性、优先级等)产生了一系列针对多核架构的目标特定的调度程序。现有的设计在运行时包含线程到线程的统计信息，并基于这样的抽象来调度线程(我们将线程到线程的交互形式化为线程交互矩阵)。然而，这样的抽象也揭示了一个一直被忽视的问题:线程交互矩阵(TIM)是高度稀疏的。因此，现有的设计只能提供次优决策，因为稀疏性问题限制了执行调度决策时可以利用的线程排列数量(及其统计数据)。我们介绍了稀疏智能线程调度(SLITS)，这是一种用于减轻TIM的稀疏性问题的通用调度程序，具有针对不同调度目标的可定制性。SLITS的设计基于以下关键见解:TIM的稀疏性问题可以通过先进的机器学习(ML)技术有效缓解。SLITS有三个组成部分。首先，SLITS仅为少量线程排列配置线程交互，并使用运行时统计信息形成TIM。其次，SLITS使用因子分解机(FM)估计TIM中的缺失值，这是一种新颖的机器学习技术，可以根据有限的信息填充大规模稀疏矩阵中的缺失值。第三，SLITS利用Lazy Reschedule(一种通用机制)作为构建块，为不同的调度目标定制不同的调度策略。我们展示了如何(1)为不同的调度目标定制狭缝，包括资源争用和公平性;(2)实现的硬件成本可以忽略不计。我们还讨论了如何将SLITS潜在地应用于线程调度的其他上下文中。我们针对四种最先进的调度器设计评估了两种SLITS变体。我们强调，在11个基准测试中，SLITS的平均速度比线程调度器的实际标准——完全公平调度器(complete Fair scheduler)——在16核设置下，针对各种线程数量(即32、64和128)的平均速度提高了1.08倍。我们的分析表明，SLITS的好处归功于缓存利用率的显著提高。此外，我们的实验结果证实，当线程数量增加时，SLITS是可扩展的，并且好处是健壮的。我们还进行了广泛的研究，以(1)分解SLITS组件以证明我们的设计选择的协同作用，(2)检查改变FM估计覆盖范围的影响，(3)证明延迟重新调度而不是定期重新调度的必要性，以及(4)证明SLITS实现的硬件开销可以是微不足道的(<1%的芯片面积和功耗)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

SLITS: Sparsity-Lightened Intelligent Thread Scheduling

A diverse set of scheduling objectives (e.g., resource contention, fairness, priority, etc.) breed a series of objective-specific schedulers for multi-core architectures. Existing designs incorporate thread-to-thread statistics at runtime, and schedule threads based on such an abstraction (we formalize thread-to-thread interaction as the Thread-Interaction Matrix). However, such an abstraction also reveals a consistently-overlooked issue: the Thread-Interaction Matrix (TIM) is highly sparse. Therefore, existing designs can only deliver sub-optimal decisions, since the sparsity issue limits the amount of thread permutations (and its statistics) to be exploited when performing scheduling decisions. We introduce Sparsity-Lightened Intelligent Thread Scheduling (SLITS), a general scheduler design for mitigating the sparsity issue of TIM, with the customizability for different scheduling objectives. SLITS is designed upon the key insight that: the sparsity issue of the TIM can be effectively mitigated via advanced Machine Learning (ML) techniques. SLITS has three components. First, SLITS profiles Thread Interactions for only a small number of thread permutations, and form the TIM using the run-time statistics. Second, SLITS estimates the missing values in the TIM using Factorization Machine (FM), a novel ML technique that can fill in the missing values within a large-scale sparse matrix based on the limited information. Third, SLITS leverages Lazy Reschedule, a general mechanism as the building block for customizing different scheduling policies for different scheduling objectives. We show how SLITS can be (1) customized for different scheduling objectives, including resource contention and fairness; and (2) implemented with only negligible hardware costs. We also discuss how SLITS can be potentially applied to other contexts of thread scheduling. We evaluate two SLITS variants against four state-of-the-art scheduler designs. We highlight that, averaged across 11 benchmarks, SLITS achieves an average speedup of 1.08X over the de facto standard for thread scheduler - the Completely Fair Scheduler, under the 16-core setting for a variety of number of threads (i.e., 32, 64 and 128). Our analysis reveals that the benefits of SLITS are credited to significant improvements of cache utilization. In addition, our experimental results confirm that SLITS is scalable and the benefits are robust when of the number of threads increases. We also perform extensive studies to (1) break down SLITS components to justify the synergy of our design choices, (2) examine the impacts of varying the estimation coverage of FM, (3) justify the necessity of Lazy Reschedule rather than periodic rescheduling, and (4) demonstrate the hardware overheads for SLITS implementations can be marginal (<1% chip area and power).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACM on Measurement and Analysis of Computing Systems

CiteScore

3.20

自引率

0.00%

发文量