Boosting single-thread performance in multi-core systems through fine-grain multi-threading

Proceedings. International Symposium on Computer Architecture Pub Date : 2009-06-15 DOI:10.1145/1555754.1555813

C. Madriles, P. López, J. M. Codina, E. Gibert, Fernando Latorre, Alejandro Martínez, Raúl Martínez, Antonio González

{"title":"Boosting single-thread performance in multi-core systems through fine-grain multi-threading","authors":"C. Madriles, P. López, J. M. Codina, E. Gibert, Fernando Latorre, Alejandro Martínez, Raúl Martínez, Antonio González","doi":"10.1145/1555754.1555813","DOIUrl":null,"url":null,"abstract":"Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law.\n In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations.\n The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.","PeriodicalId":91388,"journal":{"name":"Proceedings. International Symposium on Computer Architecture","volume":"12 1","pages":"474-483"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"28","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1555754.1555813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 28

Abstract

Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with limited TLP impose important constraints to the global performance, as explained by Amdahl's law. In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations. The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover, single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.

查看原文本刊更多论文

通过细粒度多线程提高多核系统中的单线程性能

行业已经转向多核设计，因为我们已经遇到了内存和功率的瓶颈。然而，单线程性能仍然是最重要的，因为一些应用程序具有有限的线程级并行性(TLP)，即使是具有有限TLP的一小部分也会对全局性能施加重要的约束，正如Amdahl定律所解释的那样。在本文中，我们提出了一种在多核设计中利用多核来提高单线程性能的新方法。所建议的技术具有一组新颖的硬件机制，支持在编译时生成的线程的执行。这些线程来自原始应用程序的细粒度推测分解，它们在修改后的多核系统下执行，其中包括:(1)支持多个版本的机制;(2)线程间违规检测机制;(3)重构原序列的机制;(4)检查体系结构状态和恢复以处理错误猜测的机制。在Spec2006上，对于所有配置，所提出的方案优于以前的纯硬件方案，以实现在多核设计中组合核来执行单线程应用程序的想法，平均优于10%以上。此外，当在Tiny Core上使用所提出的方案时，单线程性能平均提高了41%，对于某些选定的应用程序可提高2.6倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Symposium on Computer Architecture

自引率

0.00%

发文量