LE1: A Parameterizable VLIW Chip-Multiprocessor with Hardware PThreads Support

D. Stevens, V. Chouliaras
{"title":"LE1: A Parameterizable VLIW Chip-Multiprocessor with Hardware PThreads Support","authors":"D. Stevens, V. Chouliaras","doi":"10.1109/ISVLSI.2010.107","DOIUrl":null,"url":null,"abstract":"We discuss LE1, a parameterized VLIW Chip Multiprocessor (CMP) adhering to the shared memory programmers model. LE1's novelty lies in its ability to perform dynamic thread-spawning through hardware support for PThread-like primitives in addition to its substantial architectural and microarchitectural parameterization. Dynamic (hardware) thread creation is very fast and removes the need for an executive/OS, presenting to the application programmer a 'bare-metal' multiprocessor, capable of exploiting all forms of parallelism. The core LE1 CPU is a configurable, 8-stage pipeline VLIW engine with a proprietary Instruction Set Architecture (ISA) supporting both partial and full predication and pipelined, multi-input, multi-output (MIMO) instruction extensions. The LE1 CMP is parameterizable as to the number of processors, their issue capability, internal microarchitectural features, functional unit mix and latency and the local memory system architecture. Preliminary results indicate near-linear performance improvement when executing a threaded version of the Mandelbrot calculation on 2-way and 4-way processor configurations with a 256 KB, 4-way banked tightly-coupled memory system. Similar trends are seen when executing a threaded matrix multiplication benchmark. We present these findings along with VLSI implementations of 4-way, dual-issue and 3-way, quad issue multiprocessor configurations.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Computer Society Annual Symposium on VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2010.107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

We discuss LE1, a parameterized VLIW Chip Multiprocessor (CMP) adhering to the shared memory programmers model. LE1's novelty lies in its ability to perform dynamic thread-spawning through hardware support for PThread-like primitives in addition to its substantial architectural and microarchitectural parameterization. Dynamic (hardware) thread creation is very fast and removes the need for an executive/OS, presenting to the application programmer a 'bare-metal' multiprocessor, capable of exploiting all forms of parallelism. The core LE1 CPU is a configurable, 8-stage pipeline VLIW engine with a proprietary Instruction Set Architecture (ISA) supporting both partial and full predication and pipelined, multi-input, multi-output (MIMO) instruction extensions. The LE1 CMP is parameterizable as to the number of processors, their issue capability, internal microarchitectural features, functional unit mix and latency and the local memory system architecture. Preliminary results indicate near-linear performance improvement when executing a threaded version of the Mandelbrot calculation on 2-way and 4-way processor configurations with a 256 KB, 4-way banked tightly-coupled memory system. Similar trends are seen when executing a threaded matrix multiplication benchmark. We present these findings along with VLSI implementations of 4-way, dual-issue and 3-way, quad issue multiprocessor configurations.
LE1:支持硬件pthread的可参数化VLIW芯片多处理器
讨论了一种基于共享内存编程器模型的参数化VLIW芯片多处理器(CMP) LE1。LE1的新颖之处在于,除了其实质的体系结构和微体系结构参数化之外,它还能够通过硬件支持pthread类原语来执行动态线程生成。动态(硬件)线程创建非常快,并且消除了对执行/操作系统的需求,为应用程序程序员提供了一个“裸机”多处理器,能够利用所有形式的并行性。核心LE1 CPU是一个可配置的8级流水线VLIW引擎,具有专有的指令集架构(ISA),支持部分和完全预测以及流水线,多输入,多输出(MIMO)指令扩展。LE1 CMP是可参数化的,包括处理器的数量、它们的问题能力、内部微架构特性、功能单元组合和延迟以及本地内存系统架构。初步结果表明,当在2路和4路处理器配置上使用256 KB、4路银行紧耦合内存系统执行线程版本的Mandelbrot计算时,性能得到了近似线性的改善。在执行线程矩阵乘法基准测试时也可以看到类似的趋势。我们将这些发现与4路、双路和3路、四路多处理器配置的VLSI实现一起提出。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信