Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235) Pub Date : 1998-04-16 DOI:10.1145/279358.279399

S. Keckler, W. Dally, D. Maskit, N. Carter, Andrew Chang, W. S. Lee

{"title":"Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor","authors":"S. Keckler, W. Dally, D. Maskit, N. Carter, Andrew Chang, W. S. Lee","doi":"10.1145/279358.279399","DOIUrl":null,"url":null,"abstract":"Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of thousands of instructions. Fine-grain threads fill the parallelism gap between these extremes by enabling tasks with run lengths as small as 20 cycles. As this fine-grain parallelism is orthogonal to ILP and coarse threads, it complements both methods and provides an opportunity for greater speedup. This paper describes the efficient communication and synchronization mechanisms implemented in the Multi-ALU Processor (MAP) chip, including a thread creation instruction, register communication, and a hardware barrier. These register-based mechanisms provide 10 times faster communication and 60 times faster synchronization than mechanisms that operate via a shared on-chip cache. With a three-processor implementation of the MAP: fine-grain speedups of 1.2-2.1 are demonstrated on a suite of applications.","PeriodicalId":393075,"journal":{"name":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"88","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/279358.279399","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 88

Abstract

Much of the improvement in computer performance over the last twenty years has come from faster transistors and architectural advances that increase parallelism. Historically, parallelism has been exploited either at the instruction level with a grain-size of a single instruction or by partitioning applications into coarse threads with grain-sizes of thousands of instructions. Fine-grain threads fill the parallelism gap between these extremes by enabling tasks with run lengths as small as 20 cycles. As this fine-grain parallelism is orthogonal to ILP and coarse threads, it complements both methods and provides an opportunity for greater speedup. This paper describes the efficient communication and synchronization mechanisms implemented in the Multi-ALU Processor (MAP) chip, including a thread creation instruction, register communication, and a hardware barrier. These register-based mechanisms provide 10 times faster communication and 60 times faster synchronization than mechanisms that operate via a shared on-chip cache. With a three-processor implementation of the MAP: fine-grain speedups of 1.2-2.1 are demonstrated on a suite of applications.

查看原文本刊更多论文

在MIT多alu处理器上开发细粒度线程级并行性

在过去的二十年里，计算机性能的提高主要来自于更快的晶体管和提高并行性的架构进步。从历史上看，并行性要么在指令级别上利用单个指令的粒度，要么通过将应用程序划分为具有数千条指令粒度的粗线程来利用。细粒度线程通过支持运行长度小至20个周期的任务，填补了这两个极端之间的并行性差距。由于这种细粒度并行性与ILP和粗线程是正交的，因此它补充了这两种方法，并提供了实现更大加速的机会。本文描述了在多alu处理器(MAP)芯片中实现的高效通信和同步机制，包括线程创建指令、寄存器通信和硬件屏障。这些基于寄存器的机制提供了比通过共享片上缓存操作的机制快10倍的通信和快60倍的同步。使用MAP的三处理器实现:在一组应用程序上演示了1.2-2.1的细粒度加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235)

自引率

0.00%

发文量