VLIW处理器的集群级同步多线程

2007 25th International Conference on Computer Design Pub Date : 2007-10-01 DOI:10.1109/ICCD.2007.4601890

Manoj Gupta, F. Sánchez, J. Llosa

{"title":"VLIW处理器的集群级同步多线程","authors":"Manoj Gupta, F. Sánchez, J. Llosa","doi":"10.1109/ICCD.2007.4601890","DOIUrl":null,"url":null,"abstract":"Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous multithreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. In this paper, we propose CSMT (cluster-level simultaneous multithreading) to allow some degree of SMT in clustered VLIW processors with minimal hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle) as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardware- based cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over interleaved multithreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT.","PeriodicalId":6306,"journal":{"name":"2007 25th International Conference on Computer Design","volume":"29 1","pages":"121-128"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Cluster-level simultaneous multithreading for VLIW processors\",\"authors\":\"Manoj Gupta, F. Sánchez, J. Llosa\",\"doi\":\"10.1109/ICCD.2007.4601890\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous multithreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. In this paper, we propose CSMT (cluster-level simultaneous multithreading) to allow some degree of SMT in clustered VLIW processors with minimal hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle) as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardware- based cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over interleaved multithreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT.\",\"PeriodicalId\":6306,\"journal\":{\"name\":\"2007 25th International Conference on Computer Design\",\"volume\":\"29 1\",\"pages\":\"121-128\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2007 25th International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2007.4601890\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 25th International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2007.4601890","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

集群VLIW嵌入式处理器由于硬件简单和功耗低的优点而变得广泛。然而，虽然一些应用程序表现出大量的指令级并行性(ILP)并受益于非常宽的机器，但其他应用程序只有很少的ILP，这浪费了宽处理器中的宝贵资源。同步多线程(SMT)是一种众所周知的技术，它通过在指令粒度级别上利用线程级别的并行性来提高资源利用率。然而，为vliw实现SMT需要复杂的结构。在本文中，我们提出了CSMT(集群级同步多线程)，以最小的硬件成本和复杂性在集群VLIW处理器中允许一定程度的SMT。CSMT将在给定集群(命名为bundle)中同时执行的一组操作视为分配单元。来自给定线程的属于VLIW指令的所有包将同时发出。为了减少线程间的集群冲突，提出了一种非常简单的基于硬件的集群重命名机制。实验结果表明，与其他适用于VLIW的多线程方法相比，CSMT显著提高了ILP。例如，使用4线程时，CSMT比单线程VLIW体系结构平均加速113%，比交错多线程(IMT)平均加速36%。在某些情况下，与单线程架构相比，加速可高达228%，与IMT相比可高达97%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cluster-level simultaneous multithreading for VLIW processors

Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous multithreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. In this paper, we propose CSMT (cluster-level simultaneous multithreading) to allow some degree of SMT in clustered VLIW processors with minimal hardware cost and complexity. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle) as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardware- based cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over interleaved multithreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2007 25th International Conference on Computer Design

自引率

0.00%

发文量