Datarol:用于细粒度多线程的并行机器架构

Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228) Pub Date : 1997-11-12 DOI:10.1109/MPPM.1997.715971

M. Amamiya, H. Tomiyasu, S. Kusakabe

{"title":"Datarol:用于细粒度多线程的并行机器架构","authors":"M. Amamiya, H. Tomiyasu, S. Kusakabe","doi":"10.1109/MPPM.1997.715971","DOIUrl":null,"url":null,"abstract":"We discuss a design principle of massively parallel distributed-memory multiprocessor architecture which solves latency problem, and present the Datarol machine architecture. Latencies, caused by remote memory access and remote procedure call, are most serious problems in massively parallel computers. In order to eliminate the processor idle times caused by these latencies, processors must perform fast context switching among fine-grain concurrent processes. First, we present a processor architecture, called Datarol-II, that promotes efficient fine-grain multithread execution by performing fast context switching among fine-grain concurrent processes. In the Datarol-II processor, an implicit register load/store mechanism is embedded in the execution pipeline in order to reduce memory access overhead caused by context switching. In order to reduce local memory access latency, a two-level hierarchical memory system and a load control mechanism are also introduced. Then, we present a cost-effective design of the Datarol-II processor, which incorporates off-the-shelf high-end microprocessor while preserving the fine-grain dataflow concept. The off-the-shelf microprocessor Pentium is used for its core processing, and a co-processor called FMP (Fine-grain Message Processor) is designed for fine grained message handling and communication controls. The co-processor FMP is designed on the basis of FMD (Fine-grain Message Driven) execution model, in which fine-grain multi-threaded execution is driven and controlled by simple fine-grain message communications.","PeriodicalId":217385,"journal":{"name":"Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228)","volume":"508 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Datarol: a parallel machine architecture for fine-grain multithreading\",\"authors\":\"M. Amamiya, H. Tomiyasu, S. Kusakabe\",\"doi\":\"10.1109/MPPM.1997.715971\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We discuss a design principle of massively parallel distributed-memory multiprocessor architecture which solves latency problem, and present the Datarol machine architecture. Latencies, caused by remote memory access and remote procedure call, are most serious problems in massively parallel computers. In order to eliminate the processor idle times caused by these latencies, processors must perform fast context switching among fine-grain concurrent processes. First, we present a processor architecture, called Datarol-II, that promotes efficient fine-grain multithread execution by performing fast context switching among fine-grain concurrent processes. In the Datarol-II processor, an implicit register load/store mechanism is embedded in the execution pipeline in order to reduce memory access overhead caused by context switching. In order to reduce local memory access latency, a two-level hierarchical memory system and a load control mechanism are also introduced. Then, we present a cost-effective design of the Datarol-II processor, which incorporates off-the-shelf high-end microprocessor while preserving the fine-grain dataflow concept. The off-the-shelf microprocessor Pentium is used for its core processing, and a co-processor called FMP (Fine-grain Message Processor) is designed for fine grained message handling and communication controls. The co-processor FMP is designed on the basis of FMD (Fine-grain Message Driven) execution model, in which fine-grain multi-threaded execution is driven and controlled by simple fine-grain message communications.\",\"PeriodicalId\":217385,\"journal\":{\"name\":\"Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228)\",\"volume\":\"508 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/MPPM.1997.715971\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/MPPM.1997.715971","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

讨论了一种解决延迟问题的大规模并行分布式存储多处理器体系结构的设计原理，并提出了Datarol机体系结构。由远程内存访问和远程过程调用引起的延迟是大规模并行计算机中最严重的问题。为了消除由这些延迟引起的处理器空闲时间，处理器必须在细粒度并发进程之间执行快速上下文切换。首先，我们提出了一种称为Datarol-II的处理器体系结构，它通过在细粒度并发进程之间执行快速上下文切换来促进高效的细粒度多线程执行。在Datarol-II处理器中，隐式的寄存器加载/存储机制嵌入到执行管道中，以减少上下文切换引起的内存访问开销。为了减少本地存储器访问延迟，还引入了两级分层存储器系统和负载控制机制。然后，我们提出了一种具有成本效益的Datarol-II处理器设计，它结合了现成的高端微处理器，同时保留了细粒度数据流概念。它的核心处理使用了现成的微处理器Pentium，而一个名为FMP(细粒度消息处理器)的协处理器是为细粒度消息处理和通信控制而设计的。协处理器FMP是在FMD(细粒度消息驱动)执行模型的基础上设计的，该模型通过简单的细粒度消息通信来驱动和控制细粒度多线程执行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Datarol: a parallel machine architecture for fine-grain multithreading

We discuss a design principle of massively parallel distributed-memory multiprocessor architecture which solves latency problem, and present the Datarol machine architecture. Latencies, caused by remote memory access and remote procedure call, are most serious problems in massively parallel computers. In order to eliminate the processor idle times caused by these latencies, processors must perform fast context switching among fine-grain concurrent processes. First, we present a processor architecture, called Datarol-II, that promotes efficient fine-grain multithread execution by performing fast context switching among fine-grain concurrent processes. In the Datarol-II processor, an implicit register load/store mechanism is embedded in the execution pipeline in order to reduce memory access overhead caused by context switching. In order to reduce local memory access latency, a two-level hierarchical memory system and a load control mechanism are also introduced. Then, we present a cost-effective design of the Datarol-II processor, which incorporates off-the-shelf high-end microprocessor while preserving the fine-grain dataflow concept. The off-the-shelf microprocessor Pentium is used for its core processing, and a co-processor called FMP (Fine-grain Message Processor) is designed for fine grained message handling and communication controls. The co-processor FMP is designed on the basis of FMD (Fine-grain Message Driven) execution model, in which fine-grain multi-threaded execution is driven and controlled by simple fine-grain message communications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings. Third Working Conference on Massively Parallel Programming Models (Cat. No.97TB100228)

自引率

0.00%

发文量