Snug:对芯片多处理器中放松并发优先级队列的体系结构支持

Proceedings of the 34th ACM International Conference on Supercomputing Pub Date : 2020-06-29 DOI:10.1145/3392717.3392740

Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, J. Torrellas

{"title":"Snug:对芯片多处理器中放松并发优先级队列的体系结构支持","authors":"Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, J. Torrellas","doi":"10.1145/3392717.3392740","DOIUrl":null,"url":null,"abstract":"Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors\",\"authors\":\"Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, J. Torrellas\",\"doi\":\"10.1145/3392717.3392740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.\",\"PeriodicalId\":346687,\"journal\":{\"name\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3392717.3392740\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3392717.3392740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在图形分析和仿真等领域，许多并行算法依赖于基于优先级的任务调度。在这种环境中，选择的数据结构是并发优先级队列(PQ)。不幸的是，PQ算法表现出一种不受欢迎的权衡。一方面，严格的PQs总是将最高优先级的任务从队列中分离出来，因此由于队列头部的争用而无法扩展。另一方面，放松的PQs通过将任务从队列中分离出来来避免争用，这些任务有时离头部太远，从而导致调度失去了基于优先级调度的好处。我们提出了一种新的架构来放松pq，而不会偏离基于优先级的时间表。我们的芯片级架构称为Snug，它将PQ分配到子队列中，并维护一组指向每个子队列中最高优先级任务的Work寄存器。Snug提供了一条指令，该指令选择要执行的高质量任务。该指令定期在获取精确的全局快照和仅访问本地子队列之间切换，以减少流量。总体而言，Snug在避免热点和过度网络流量的同时，解除了高质量任务的队列。我们用图形分析和事件模拟程序来评估Snug。在模拟的64核芯片上，与最先进的并发跳跃列表、SprayList和软件分布式pq相比，Snug分别将程序的平均执行时间减少了1.4倍、2.4倍和3.6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors

Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 34th ACM International Conference on Supercomputing

自引率

0.00%

发文量