Improving single-thread fetch performance on a multithreaded processor

J. Moure, R. B. García, Dolores Rexachs, E. Luque
{"title":"Improving single-thread fetch performance on a multithreaded processor","authors":"J. Moure, R. B. García, Dolores Rexachs, E. Luque","doi":"10.1109/DSD.2001.952344","DOIUrl":null,"url":null,"abstract":"Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rate than single-thread processors. On a multi-thread workload, a clustered organization maximizes performances. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly. Using a clustered multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most performance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-three fetch block.","PeriodicalId":285358,"journal":{"name":"Proceedings Euromicro Symposium on Digital Systems Design","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Euromicro Symposium on Digital Systems Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD.2001.952344","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Multithreaded processors, by simultaneously using both the thread-level parallelism and the instruction-level parallelism of applications, achieve larger instruction per cycle rate than single-thread processors. On a multi-thread workload, a clustered organization maximizes performances. On a single-thread workload, however, all but one of the clusters are idle, degrading single-thread performance significantly. Using a clustered multi-thread performance as a baseline, we propose and analyze several mechanisms and policies to improve single-thread execution exploiting the existing hardware without a significant multi-thread performance loss. We focus on the fetch unit, which is maybe the most performance-critical stage. Essentially, we analyze three ways of exploiting the idle fetch clusters: allowing a single thread accessing its neighbor clusters, use the idle fetch clusters to provide multiple-path execution, or use them to widen the effective single-three fetch block.
改进多线程处理器上的单线程读取性能
多线程处理器通过同时使用应用程序的线程级并行性和指令级并行性,可以实现比单线程处理器更高的每周期指令速率。在多线程工作负载上,集群组织可以最大化性能。但是,在单线程工作负载上,除了一个集群外,其他集群都是空闲的,这会显著降低单线程性能。以集群多线程性能为基准,我们提出并分析了几种机制和策略,以改进利用现有硬件的单线程执行,而不会造成明显的多线程性能损失。我们专注于取回单元,这可能是性能最关键的阶段。从本质上讲,我们分析了利用空闲获取集群的三种方法:允许单个线程访问其相邻集群,使用空闲获取集群提供多路径执行,或者使用它们扩大有效的单三个获取块。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信