Fence-free work stealing on bounded TSO processors

Adam Morrison, Y. Afek
{"title":"Fence-free work stealing on bounded TSO processors","authors":"Adam Morrison, Y. Afek","doi":"10.1145/2541940.2541987","DOIUrl":null,"url":null,"abstract":"Work stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when removing a task, and these fences are believed to be necessary for correctness. This paper refutes this belief, demonstrating work stealing algorithms in which a worker does not issue a memory fence for microarchitectures with a bounded total store ordering (TSO) memory model. Bounded TSO is a novel restriction of TSO~-- capturing mainstream x86 and SPARC TSO processors -- that bounds the number of stores a load can be reordered with. Our algorithms eliminate the memory fence penalty, improving the running time of a suite of parallel benchmarks on modern x86 multicore processors by 7%-11% on average (and up to 23%), compared to the Cilk and Chase-Lev work stealing queues.","PeriodicalId":128805,"journal":{"name":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th international conference on Architectural support for programming languages and operating systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2541940.2541987","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Work stealing is the method of choice for load balancing in task parallel programming languages and frameworks. Yet despite considerable effort invested in optimizing work stealing task queues, existing algorithms issue a costly memory fence when removing a task, and these fences are believed to be necessary for correctness. This paper refutes this belief, demonstrating work stealing algorithms in which a worker does not issue a memory fence for microarchitectures with a bounded total store ordering (TSO) memory model. Bounded TSO is a novel restriction of TSO~-- capturing mainstream x86 and SPARC TSO processors -- that bounds the number of stores a load can be reordered with. Our algorithms eliminate the memory fence penalty, improving the running time of a suite of parallel benchmarks on modern x86 multicore processors by 7%-11% on average (and up to 23%), compared to the Cilk and Chase-Lev work stealing queues.
在有界的TSO处理器上进行无栅栏工作窃取
工作窃取是任务并行编程语言和框架中负载平衡的首选方法。然而,尽管在优化工作窃取任务队列方面投入了大量精力,但现有算法在删除任务时会产生代价高昂的内存围栏,而这些围栏被认为是正确性所必需的。本文驳斥了这一观点,展示了工作窃取算法,其中工作人员不会为具有有限总存储排序(TSO)内存模型的微体系结构发出内存围栏。有界TSO是TSO~的一种新限制——捕获主流x86和SPARC TSO处理器——它限制了一个负载可以重新排序的存储数量。我们的算法消除了内存障碍,与Cilk和Chase-Lev的偷工作队列相比,在现代x86多核处理器上,一套并行基准测试的运行时间平均提高了7%-11%(最高可提高23%)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信