{"title":"Round Robin Thread Selection Optimization in Multithreaded Processors","authors":"Shane Carroll, Wei-Ming Lin","doi":"10.1142/S0129626419500038","DOIUrl":null,"url":null,"abstract":"We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.","PeriodicalId":422436,"journal":{"name":"Parallel Process. Lett.","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Parallel Process. Lett.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1142/S0129626419500038","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose a variation of round-robin ordering in an multi-threaded pipeline to increase system throughput and resource distribution fairness. We show that using round robin with a typical arbitrary ordering results in inefficient use of shared resources and subsequent thread starvation. To address this but still use a simple round-robin approach, we optimally and dynamically sort the order of the round robin periodically at runtime. We show that with 4-threaded workloads, throughput can be improved by over 9% and harmonic throughput by over 3% by sorting thread order at run time. We experiment with multiple stages of the pipeline and show consistent results throughout several experiments using the SPEC CPU 2006 benchmarks. Furthermore, since the technique is still a simple round robin, the increased performance requires little overhead to implement.
为了提高系统吞吐量和资源分配的公平性,我们提出了一种在多线程管道中进行循环排序的方法。我们表明,使用具有典型任意顺序的轮循会导致共享资源的低效使用和随后的线程饥饿。为了解决这个问题,但仍然使用简单的轮询方法,我们在运行时周期性地对轮询顺序进行优化和动态排序。我们表明,对于4线程工作负载,通过在运行时对线程顺序进行排序,吞吐量可以提高9%以上,协调吞吐量可以提高3%以上。我们对管道的多个阶段进行了实验,并在使用SPEC CPU 2006基准测试的几个实验中显示出一致的结果。此外,由于该技术仍然是一个简单的轮询,因此提高的性能只需要很少的开销来实现。