{"title":"BLQ:用于无阻塞队列的轻量级位置感知运行时","authors":"Qinzhe Wu, Ruihao Li, Jonathan Beard, L. K. John","doi":"10.1145/3640537.3641568","DOIUrl":null,"url":null,"abstract":"Message queues are used widely in parallel processing systems for worker thread synchronization. When there is a throughput mismatch between the upstream and down-stream tasks, the message queue buffer will often exist as either empty or full. Polling on an empty or full queue will affect the performance of upstream or downstream threads, since such polling cycles could have been spent on other computation. Non-blocking queue is an alternative that allow polling cycles to be spared for other tasks per applications’ choice. However, application programmers are not supposed to bear the burden, because a good decision of what to do upon blocking has to take many runtime environment information into consideration. This paper proposes Blocking-Less Queuing Runtime ( BLQ ), a systematic solution capable of finding the proper strategies at (or before) blocking, as well as lightening the programmers’ burden. BLQ collects a set of solutions, including yielding, advanced dynamic queue buffer resizing, and resource-aware task scheduling. The evaluation on high-end servers shows that a set of diverse parallel queuing workloads could reduce blocking and lower cache misses with BLQ . BLQ outperforms the baseline runtime considerably (with up to 3 . 8 × peak speedup). CCS","PeriodicalId":147184,"journal":{"name":"International Conference on Compiler Construction","volume":"192 ","pages":"100-112"},"PeriodicalIF":0.0000,"publicationDate":"2024-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less Queuing\",\"authors\":\"Qinzhe Wu, Ruihao Li, Jonathan Beard, L. K. John\",\"doi\":\"10.1145/3640537.3641568\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Message queues are used widely in parallel processing systems for worker thread synchronization. When there is a throughput mismatch between the upstream and down-stream tasks, the message queue buffer will often exist as either empty or full. Polling on an empty or full queue will affect the performance of upstream or downstream threads, since such polling cycles could have been spent on other computation. Non-blocking queue is an alternative that allow polling cycles to be spared for other tasks per applications’ choice. However, application programmers are not supposed to bear the burden, because a good decision of what to do upon blocking has to take many runtime environment information into consideration. This paper proposes Blocking-Less Queuing Runtime ( BLQ ), a systematic solution capable of finding the proper strategies at (or before) blocking, as well as lightening the programmers’ burden. BLQ collects a set of solutions, including yielding, advanced dynamic queue buffer resizing, and resource-aware task scheduling. The evaluation on high-end servers shows that a set of diverse parallel queuing workloads could reduce blocking and lower cache misses with BLQ . BLQ outperforms the baseline runtime considerably (with up to 3 . 8 × peak speedup). CCS\",\"PeriodicalId\":147184,\"journal\":{\"name\":\"International Conference on Compiler Construction\",\"volume\":\"192 \",\"pages\":\"100-112\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Compiler Construction\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3640537.3641568\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Compiler Construction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3640537.3641568","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
BLQ: Light-Weight Locality-Aware Runtime for Blocking-Less Queuing
Message queues are used widely in parallel processing systems for worker thread synchronization. When there is a throughput mismatch between the upstream and down-stream tasks, the message queue buffer will often exist as either empty or full. Polling on an empty or full queue will affect the performance of upstream or downstream threads, since such polling cycles could have been spent on other computation. Non-blocking queue is an alternative that allow polling cycles to be spared for other tasks per applications’ choice. However, application programmers are not supposed to bear the burden, because a good decision of what to do upon blocking has to take many runtime environment information into consideration. This paper proposes Blocking-Less Queuing Runtime ( BLQ ), a systematic solution capable of finding the proper strategies at (or before) blocking, as well as lightening the programmers’ burden. BLQ collects a set of solutions, including yielding, advanced dynamic queue buffer resizing, and resource-aware task scheduling. The evaluation on high-end servers shows that a set of diverse parallel queuing workloads could reduce blocking and lower cache misses with BLQ . BLQ outperforms the baseline runtime considerably (with up to 3 . 8 × peak speedup). CCS