对移动线程架构的迁移开销进行建模

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW) Pub Date : 2010-04-19 DOI:10.1109/IPDPSW.2010.5470686

P. Fratta, P. Kogge

{"title":"对移动线程架构的迁移开销进行建模","authors":"P. Fratta, P. Kogge","doi":"10.1109/IPDPSW.2010.5470686","DOIUrl":null,"url":null,"abstract":"Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Modeling bounds on migration overhead for a traveling thread architecture\",\"authors\":\"P. Fratta, P. Kogge\",\"doi\":\"10.1109/IPDPSW.2010.5470686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.\",\"PeriodicalId\":329280,\"journal\":{\"name\":\"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2010.5470686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2010.5470686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

异构多核体系结构在通用和科学计算社区中得到了广泛的应用，架构师们继续研究减轻程序员并行化负担的技术。本文提出了一类新的异构多核，它利用了过去支持流动线程执行的体系结构中的工作。这些移动线程在分布在整个芯片上的简单内核上执行，并且可以根据数据位置在层次结构和内核之间移动。这种新设计通过智能数据放置和协作缓存策略，在更低的能量和功率密度下提供了比集中式对等物更高的性能。我们采用一种由数学建模和模拟组成的方法来估计各种架构组织的迁移开销的上限。结果表明，在合理的线程大小下，新架构可以匹配传统处理器的性能。我们已经观察到，每次迁移(IPM) 0.04到7.09条指令(平均1.88条IPM)足以匹配传统处理器的性能。这些结果证实，这种分布式体系结构和相应的执行模型在克服集中式对等体的设计挑战方面具有很大的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Modeling bounds on migration overhead for a traveling thread architecture

Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)

自引率

0.00%

发文量