Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, J. Torrellas
{"title":"Snug:对芯片多处理器中放松并发优先级队列的体系结构支持","authors":"Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, J. Torrellas","doi":"10.1145/3392717.3392740","DOIUrl":null,"url":null,"abstract":"Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.","PeriodicalId":346687,"journal":{"name":"Proceedings of the 34th ACM International Conference on Supercomputing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors\",\"authors\":\"Azin Heidarshenas, Tanmay Gangwani, Serif Yesil, Adam Morrison, J. Torrellas\",\"doi\":\"10.1145/3392717.3392740\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.\",\"PeriodicalId\":346687,\"journal\":{\"name\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 34th ACM International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3392717.3392740\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 34th ACM International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3392717.3392740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Snug: architectural support for relaxed concurrent priority queueing in chip multiprocessors
Many parallel algorithms in domains such as graph analytics and simulations rely on priority-based task scheduling. In such environments, the data structure of choice is a concurrent priority queue (PQ). Unfortunately, PQ algorithms exhibit an undesirable tradeoff. On one hand, strict PQs always dequeue the highest-priority task, and thus fail to scale because of contention at the head of the queue. On the other hand, relaxed PQs avoid contention by dequeuing tasks that are sometimes so far from the head that the resulting schedule misses the benefit of priority-based scheduling. We propose a novel architecture for relaxing PQs without straying far from the priority-based schedule. Our chip-level architecture, called Snug, distributes the PQ into subqueues, and maintains a set of Work registers that point to the highest-priority task in each sub-queue. Snug provides an instruction that picks a high-quality task to execute. The instruction periodically switches between obtaining an accurate global snapshot, and visiting only local subqueues to reduce traffic. Overall, Snug dequeues high-quality tasks while avoiding both hotspots and excessive network traffic. We evaluate Snug on graph analytics and event simulation programs. On a simulated 64-core chip, Snug reduces the average execution time of the programs by 1.4X, 2.4X and 3.6X compared to state-of-the-art concurrent skip list, SprayList, and software-distributed PQs, respectively.