John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien
{"title":"分布式内存多计算机上细粒度语言的混合执行模型","authors":"John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien","doi":"10.1145/224170.224302","DOIUrl":null,"url":null,"abstract":"While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.","PeriodicalId":269909,"journal":{"name":"Proceedings of the IEEE/ACM SC95 Conference","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"44","resultStr":"{\"title\":\"A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers\",\"authors\":\"John Plevyak, V. Karamcheti, Xingbin Zhang, A. Chien\",\"doi\":\"10.1145/224170.224302\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.\",\"PeriodicalId\":269909,\"journal\":{\"name\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"44\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the IEEE/ACM SC95 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/224170.224302\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE/ACM SC95 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/224170.224302","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Hybrid Execution Model for Fine-Grained Languages on Distributed Memory Multicomputers
While fine-grained concurrent languages can naturally capture concurrency in many irregular and dynamic problems, their flexibility has generally resulted in poor execution effciency. In such languages the computation consists of many small threads which are created dynamically and synchronized implicitly. In order to minimize the overhead of these operations, we propose a hybrid execution model which dynamically adapts to runtime data layout, providing both sequential efficiency and low overhead parallel execution. This model uses separately optimized sequential and parallel versions of code. Sequential efficiency is obtained by dynamically coalescing threads via stack-based execution and parallel efficiency through latency hiding and cheap synchronization using heap-allocated activation frames. Novel aspects of the stack mechanism include handling return values for futures and executing forwarded messages (the responsibility to reply is passed along, like call/cc in Scheme) on the stack. In addition, the hybrid execution model is expressed entirely in C, and therefore is easily portable to many systems. Experiments with function-call intensive programs show that this model achieves sequential efficiency comparable to C programs. Experiments with regular and irregular application kernels on the CM5 and T3D demonstrate that it can yield 1.5 to 3 times better performance than code optimized for parallel execution alone.