A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers

Proceedings of the IEEE/ACM SC95 Conference Pub Date : 1995-12-08 DOI:10.1145/224170.224302

John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien

{"title":"A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers","authors":"John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien","doi":"10.1145/224170.224302","DOIUrl":null,"url":null,"abstract":"While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/224170.224302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 44

Abstract

While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.

查看原文本刊更多论文

分布式内存多计算机上细粒度语言的混合执行模型

虽然细粒度并发语言可以自然地捕获许多不规则和动态问题中的并发性，但它们的灵活性通常导致执行效率低下。在这些语言中，计算由许多动态创建和隐式同步的小线程组成。为了最大限度地减少这些操作的开销，我们提出了一种混合执行模型，该模型可以动态地适应运行时数据布局，同时提供顺序效率和低开销的并行执行。该模型使用分别优化的顺序版本和并行版本的代码。顺序效率是通过基于堆栈的执行动态合并线程获得的，并行效率是通过延迟隐藏和使用堆分配激活帧的廉价同步获得的。堆栈机制的新方面包括在堆栈上处理期货的返回值和执行转发的消息(响应的责任被传递，就像Scheme中的call/cc)。此外，混合执行模型完全用C语言表示，因此很容易移植到许多系统。在函数调用密集的程序中进行的实验表明，该模型达到了与C程序相当的顺序效率。在CM5和T3D上对规则和不规则应用程序内核进行的实验表明，它的性能比单独为并行执行优化的代码提高1.5到3倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the IEEE/ACM SC95 Conference

自引率

0.00%

发文量