Pthreads for Dynamic and Irregular Parallelism

G. Narlikar, G. Blelloch
{"title":"Pthreads for Dynamic and Irregular Parallelism","authors":"G. Narlikar, G. Blelloch","doi":"10.1109/SC.1998.10005","DOIUrl":null,"url":null,"abstract":"High performance applications on shared memory machines have typically been written in a coarse grained style, with one heavyweight thread per processor. In comparison, programming with a large number of lightweight, parallel threads has several advantages, including simpler coding for programs with irregular and dynamic parallelism, and better adaptability to a changing number of processors. The programmer can express a new thread to execute each individual parallel task; the implementation dynamically creates and schedules these threads onto the processors, and effectively balances the load. However, unless the threads scheduler is designed carefully, the parallel program may suffer poor space and time performance. In this paper, we study the performance of a native, lightweight POSIX threads (Pthreads) library on a shared memory machine running Solaris; to our knowledge, the Solaris library is one of the most efficient user-level implementations of the Pthreads standard available today. To evaluate this Pthreads implementation, we use a set of parallel programs that dynamically create a large number of threads. The programs include dense and sparse matrix multiplies, two N-body codes, a data classifier, a volume rendering benchmark, and a high performance FFT package. We find the existing threads scheduler to be unsuitable for executing such programs. We show how simple modifications to the Pthreads scheduler can result in significantly improved space and time performance for the programs; the modified scheduler results in as much as 44% less running time and 63% less memory requirement compared to the original Pthreads implementation. Our results indicate that, provided we use a good scheduler, the rich functionality and standard API of Pthreads can be combined with the advantages of dynamic, lightweight threads to result in high performance.","PeriodicalId":113978,"journal":{"name":"Proceedings of the IEEE/ACM SC98 Conference","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC98 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.1998.10005","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

Abstract

High performance applications on shared memory machines have typically been written in a coarse grained style, with one heavyweight thread per processor. In comparison, programming with a large number of lightweight, parallel threads has several advantages, including simpler coding for programs with irregular and dynamic parallelism, and better adaptability to a changing number of processors. The programmer can express a new thread to execute each individual parallel task; the implementation dynamically creates and schedules these threads onto the processors, and effectively balances the load. However, unless the threads scheduler is designed carefully, the parallel program may suffer poor space and time performance. In this paper, we study the performance of a native, lightweight POSIX threads (Pthreads) library on a shared memory machine running Solaris; to our knowledge, the Solaris library is one of the most efficient user-level implementations of the Pthreads standard available today. To evaluate this Pthreads implementation, we use a set of parallel programs that dynamically create a large number of threads. The programs include dense and sparse matrix multiplies, two N-body codes, a data classifier, a volume rendering benchmark, and a high performance FFT package. We find the existing threads scheduler to be unsuitable for executing such programs. We show how simple modifications to the Pthreads scheduler can result in significantly improved space and time performance for the programs; the modified scheduler results in as much as 44% less running time and 63% less memory requirement compared to the original Pthreads implementation. Our results indicate that, provided we use a good scheduler, the rich functionality and standard API of Pthreads can be combined with the advantages of dynamic, lightweight threads to result in high performance.
动态和不规则并行的Pthreads
共享内存机器上的高性能应用程序通常以粗粒度风格编写,每个处理器一个重量级线程。相比之下,使用大量轻量级并行线程进行编程有几个优点,包括对具有不规则并行性和动态并行性的程序进行更简单的编码,以及更好地适应不断变化的处理器数量。程序员可以表达一个新线程来执行每个单独的并行任务;实现动态地在处理器上创建和调度这些线程,并有效地平衡负载。但是,除非仔细设计线程调度器,否则并行程序的空间和时间性能可能会很差。在本文中,我们研究了本机轻量级POSIX线程(Pthreads)库在运行Solaris的共享内存机器上的性能;据我们所知,Solaris库是目前可用的Pthreads标准的最有效的用户级实现之一。为了评估这个Pthreads实现,我们使用一组动态创建大量线程的并行程序。这些程序包括密集和稀疏矩阵乘法、两个n体代码、一个数据分类器、一个体渲染基准和一个高性能FFT包。我们发现现有的线程调度器不适合执行这样的程序。我们展示了对Pthreads调度器的简单修改如何显著提高程序的空间和时间性能;与最初的Pthreads实现相比,修改后的调度器的运行时间减少了44%,内存需求减少了63%。我们的结果表明,只要我们使用一个好的调度器,Pthreads的丰富功能和标准API可以与动态、轻量级线程的优势相结合,从而获得高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信