{"title":"LE1:支持硬件pthread的可参数化VLIW芯片多处理器","authors":"D. Stevens, V. Chouliaras","doi":"10.1109/ISVLSI.2010.107","DOIUrl":null,"url":null,"abstract":"We discuss LE1, a parameterized VLIW Chip Multiprocessor (CMP) adhering to the shared memory programmers model. LE1's novelty lies in its ability to perform dynamic thread-spawning through hardware support for PThread-like primitives in addition to its substantial architectural and microarchitectural parameterization. Dynamic (hardware) thread creation is very fast and removes the need for an executive/OS, presenting to the application programmer a 'bare-metal' multiprocessor, capable of exploiting all forms of parallelism. The core LE1 CPU is a configurable, 8-stage pipeline VLIW engine with a proprietary Instruction Set Architecture (ISA) supporting both partial and full predication and pipelined, multi-input, multi-output (MIMO) instruction extensions. The LE1 CMP is parameterizable as to the number of processors, their issue capability, internal microarchitectural features, functional unit mix and latency and the local memory system architecture. Preliminary results indicate near-linear performance improvement when executing a threaded version of the Mandelbrot calculation on 2-way and 4-way processor configurations with a 256 KB, 4-way banked tightly-coupled memory system. Similar trends are seen when executing a threaded matrix multiplication benchmark. We present these findings along with VLSI implementations of 4-way, dual-issue and 3-way, quad issue multiprocessor configurations.","PeriodicalId":187530,"journal":{"name":"2010 IEEE Computer Society Annual Symposium on VLSI","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"LE1: A Parameterizable VLIW Chip-Multiprocessor with Hardware PThreads Support\",\"authors\":\"D. Stevens, V. Chouliaras\",\"doi\":\"10.1109/ISVLSI.2010.107\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We discuss LE1, a parameterized VLIW Chip Multiprocessor (CMP) adhering to the shared memory programmers model. LE1's novelty lies in its ability to perform dynamic thread-spawning through hardware support for PThread-like primitives in addition to its substantial architectural and microarchitectural parameterization. Dynamic (hardware) thread creation is very fast and removes the need for an executive/OS, presenting to the application programmer a 'bare-metal' multiprocessor, capable of exploiting all forms of parallelism. The core LE1 CPU is a configurable, 8-stage pipeline VLIW engine with a proprietary Instruction Set Architecture (ISA) supporting both partial and full predication and pipelined, multi-input, multi-output (MIMO) instruction extensions. The LE1 CMP is parameterizable as to the number of processors, their issue capability, internal microarchitectural features, functional unit mix and latency and the local memory system architecture. Preliminary results indicate near-linear performance improvement when executing a threaded version of the Mandelbrot calculation on 2-way and 4-way processor configurations with a 256 KB, 4-way banked tightly-coupled memory system. Similar trends are seen when executing a threaded matrix multiplication benchmark. We present these findings along with VLSI implementations of 4-way, dual-issue and 3-way, quad issue multiprocessor configurations.\",\"PeriodicalId\":187530,\"journal\":{\"name\":\"2010 IEEE Computer Society Annual Symposium on VLSI\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE Computer Society Annual Symposium on VLSI\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISVLSI.2010.107\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Computer Society Annual Symposium on VLSI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2010.107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
LE1: A Parameterizable VLIW Chip-Multiprocessor with Hardware PThreads Support
We discuss LE1, a parameterized VLIW Chip Multiprocessor (CMP) adhering to the shared memory programmers model. LE1's novelty lies in its ability to perform dynamic thread-spawning through hardware support for PThread-like primitives in addition to its substantial architectural and microarchitectural parameterization. Dynamic (hardware) thread creation is very fast and removes the need for an executive/OS, presenting to the application programmer a 'bare-metal' multiprocessor, capable of exploiting all forms of parallelism. The core LE1 CPU is a configurable, 8-stage pipeline VLIW engine with a proprietary Instruction Set Architecture (ISA) supporting both partial and full predication and pipelined, multi-input, multi-output (MIMO) instruction extensions. The LE1 CMP is parameterizable as to the number of processors, their issue capability, internal microarchitectural features, functional unit mix and latency and the local memory system architecture. Preliminary results indicate near-linear performance improvement when executing a threaded version of the Mandelbrot calculation on 2-way and 4-way processor configurations with a 256 KB, 4-way banked tightly-coupled memory system. Similar trends are seen when executing a threaded matrix multiplication benchmark. We present these findings along with VLSI implementations of 4-way, dual-issue and 3-way, quad issue multiprocessor configurations.