RUBIC: Online Parallelism Tuning for Co-located Transactional Memory Applications

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures Pub Date : 2016-07-11 DOI:10.1145/2935764.2935770

Amin Mohtasham, J. Barreto

{"title":"RUBIC: Online Parallelism Tuning for Co-located Transactional Memory Applications","authors":"Amin Mohtasham, J. Barreto","doi":"10.1145/2935764.2935770","DOIUrl":null,"url":null,"abstract":"With the advent of Chip-Multiprocessors, Transactional Memory (TM) emerged as a powerful paradigm to simplify parallel programming. Unfortunately, as more cores become available in commodity systems, the scalability limits of a wide class of TM applications become more evident. Hence, online parallelism tuning techniques were proposed to adapt the optimal number of threads of TM applications. However, state-of-the-art solutions are exclusively tailored to single-process systems with relatively static workloads, exhibiting pathological behaviors in scenarios where multiple multi-threaded TM processes contend for the shared hardware resources. This paper proposes RUBIC, a novel parallelism tuning method for TM applications in both single and multi-process scenarios that overcomes the shortcomings of the preciously proposed solutions. RUBIC helps the co-running processes adapt their parallelism level so that they can efficiently space-share the hardware. When compared to previous online parallelism tuning solutions, RUBIC achieves unprecedented system-wide fairness and efficiency, both in single- and multi-process scenarios. Our evaluation with different workloads and scenarios shows that, on average, RUBIC enhances the overall performance by 26% with respect to the best-performing state-of-the-art online parallelism tuning techniques in multi-process scenarios, while incurring negligible overhead in single-process cases. RUBIC also exhibits unique features in converging to a fair and efficient state.","PeriodicalId":346939,"journal":{"name":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2935764.2935770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

With the advent of Chip-Multiprocessors, Transactional Memory (TM) emerged as a powerful paradigm to simplify parallel programming. Unfortunately, as more cores become available in commodity systems, the scalability limits of a wide class of TM applications become more evident. Hence, online parallelism tuning techniques were proposed to adapt the optimal number of threads of TM applications. However, state-of-the-art solutions are exclusively tailored to single-process systems with relatively static workloads, exhibiting pathological behaviors in scenarios where multiple multi-threaded TM processes contend for the shared hardware resources. This paper proposes RUBIC, a novel parallelism tuning method for TM applications in both single and multi-process scenarios that overcomes the shortcomings of the preciously proposed solutions. RUBIC helps the co-running processes adapt their parallelism level so that they can efficiently space-share the hardware. When compared to previous online parallelism tuning solutions, RUBIC achieves unprecedented system-wide fairness and efficiency, both in single- and multi-process scenarios. Our evaluation with different workloads and scenarios shows that, on average, RUBIC enhances the overall performance by 26% with respect to the best-performing state-of-the-art online parallelism tuning techniques in multi-process scenarios, while incurring negligible overhead in single-process cases. RUBIC also exhibits unique features in converging to a fair and efficient state.

查看原文本刊更多论文

RUBIC:共置事务性内存应用程序的在线并行性调优

随着芯片多处理器的出现，事务性内存(TM)成为简化并行编程的强大范例。不幸的是，随着商用系统中可用的内核越来越多，大量TM应用程序的可伸缩性限制变得更加明显。因此，提出了在线并行调优技术，以适应TM应用程序的最优线程数。然而，最先进的解决方案专门针对具有相对静态工作负载的单进程系统量身定制，在多个多线程TM进程争夺共享硬件资源的场景中表现出病态行为。本文提出了RUBIC，这是一种针对单进程和多进程场景下TM应用的新型并行调优方法，克服了已有方案的缺点。RUBIC帮助协同运行的进程调整它们的并行级别，以便它们能够有效地共享硬件空间。与以前的在线并行调优解决方案相比，RUBIC在单进程和多进程场景中都实现了前所未有的系统范围的公平性和效率。我们对不同工作负载和场景的评估表明，在多进程场景中，与性能最好的最先进的在线并行调优技术相比，RUBIC的总体性能平均提高了26%，而在单进程场景中，RUBIC的开销可以忽略不计。RUBIC在向公平和高效的状态收敛方面也表现出独特的特点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 28th ACM Symposium on Parallelism in Algorithms and Architectures

自引率

0.00%

发文量