SMT和CMP架构中线程级推测的效率——性能、功率和热的观点

2008 IEEE International Conference on Computer Design Pub Date : 2008-12-01 DOI:10.1109/ICCD.2008.4751875

Venkatesan Packirisamy, Yangchun Luo, W. Hung, Antonia Zhai, P. Yew, Tin-fook Ngai

{"title":"SMT和CMP架构中线程级推测的效率——性能、功率和热的观点","authors":"Venkatesan Packirisamy, Yangchun Luo, W. Hung, Antonia Zhai, P. Yew, Tin-fook Ngai","doi":"10.1109/ICCD.2008.4751875","DOIUrl":null,"url":null,"abstract":"Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000psilas. However, because of the lack of compilers and other related software technologies, most of the general-purpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Efficiency of thread-level speculation in SMT and CMP architectures - performance, power and thermal perspective\",\"authors\":\"Venkatesan Packirisamy, Yangchun Luo, W. Hung, Antonia Zhai, P. Yew, Tin-fook Ngai\",\"doi\":\"10.1109/ICCD.2008.4751875\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000psilas. However, because of the lack of compilers and other related software technologies, most of the general-purpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.\",\"PeriodicalId\":345501,\"journal\":{\"name\":\"2008 IEEE International Conference on Computer Design\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2008.4751875\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2008.4751875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

计算机行业已经采用了多线程和多核架构，因为时钟速率在2000年初停止了增长。然而，由于缺乏编译器和其他相关软件技术，目前大多数通用应用程序仍然无法利用这种体系结构来提高其性能。线程级推测(TLS)已经被提出作为使用这些多线程架构来并行化通用应用程序的一种方法。同时多线程(SMT)和芯片多处理器(CMP)都已经扩展到实现TLS。虽然SMT和CMP在多编程和并行工作负载下的特性已经得到了广泛的研究，但它们在TLS工作负载下的行为却没有得到很好的理解。由于可能被回滚的线程的推测性质以及应用程序中可用的不同程度的并行性，TLS工作负载表现出独特的特征，使其与其他工作负载不同。在本文中，我们详细研究了这些多线程架构与具有相同芯片面积的超标量架构的性能，功耗和热效应。广泛的设计选择和权衡也研究使用常用的仿真技术。我们表明，基于SMT的TLS架构的性能比最佳的基于CMP的配置好21%，而它的功耗开销约为16%。在能量-延迟-平方积(ED2)方面，基于SMT的TLS比最佳的基于CMP的TLS配置性能好26%，比标量架构性能好11%。但是基于SMT的TLS配置比基于CMP的TLS架构产生更多的热应力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficiency of thread-level speculation in SMT and CMP architectures - performance, power and thermal perspective

Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000psilas. However, because of the lack of compilers and other related software technologies, most of the general-purpose applications today still cannot take advantage of such architectures to improve their performance. Thread-level speculation (TLS) has been proposed as a way of using these multi-threaded architectures to parallelize general-purpose applications. Both simultaneous multithreading (SMT) and chip multiprocessors (CMP) have been extended to implement TLS. While the characteristics of SMT and CMP have been widely studied under multi-programmed and parallel workloads, their behavior under TLS workload is not well understood. The TLS workload due to speculative nature of the threads which could potentially be rollbacked and due to variable degree of parallelism available in applications, exhibits unique characteristics which makes it different from other workloads. In this paper, we present a detailed study of the performance, power consumption and thermal effect of these multithreaded architectures against that of a Superscalar with equal chip area. A wide spectrum of design choices and tradeoffs are also studied using commonly used simulation techniques. We show that the SMT based TLS architecture performs about 21% better than the best CMP based configuration while it suffers about 16% power overhead. In terms of Energy-Delay-Squared product (ED2), SMT based TLS performs about 26% better than the best CMP based TLS configuration and 11% better than the superscalar architecture. But the SMT based TLS configuration, causes more thermal stress than the CMP based TLS architectures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Computer Design

自引率

0.00%

发文量