{"title":"对移动线程架构的迁移开销进行建模","authors":"P. Fratta, P. Kogge","doi":"10.1109/IPDPSW.2010.5470686","DOIUrl":null,"url":null,"abstract":"Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.","PeriodicalId":329280,"journal":{"name":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Modeling bounds on migration overhead for a traveling thread architecture\",\"authors\":\"P. Fratta, P. Kogge\",\"doi\":\"10.1109/IPDPSW.2010.5470686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.\",\"PeriodicalId\":329280,\"journal\":{\"name\":\"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)\",\"volume\":\"61 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2010.5470686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2010.5470686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modeling bounds on migration overhead for a traveling thread architecture
Heterogeneous multicore architectures have gained widespread use in the general purpose and scientific computing communities, and architects continue to investigate techniques for easing the burden of parallelization from the programmer. This paper presents a new class of heterogeneous multicores that leverages past work in architectures supporting the execution of traveling threads. These traveling threads execute on simple cores distributed across the chip and can move up the hierarchy and between cores based on data locality. This new design offers the benefits of improved performance at lower energy and power density than centralized counterparts through intelligent data placement and cooperative caching policies. We employ a methodology consisting of mathematical modeling and simulation to estimate the upper bounds on migration overhead for various architectural organizations. Results illustrate that the new architecture can match the performance of a conventional processor with reasonable thread sizes. We have observed that between 0.04 and 7.09 instructions per migration (IPM) (1.88 IPM on average) are sufficient to match the performance of the conventional processor. These results confirm that this distributed architecture and corresponding execution model offer promising potential in overcoming the design challenges of centralized counterparts.