{"title":"On the Relevance of Architectural Awareness for Efficient Fork/Join Support on Cluster-Based Manycores","authors":"H. Al-Khalissi, Mladen Berekovic, A. Marongiu","doi":"10.1145/2613908.2613911","DOIUrl":null,"url":null,"abstract":"Several recent manycores leverage a hierarchical design, where small-medium numbers of cores are grouped inside clusters and enjoy low-latency, high-bandwidth local communication through fast L1 scratchpad memories. Several clusters can be interconnected through a network-on-chip (NoC), which ensures system scalability but introduces non-uniform memory access (NUMA) effects: the cost to access a specific memory location depends of the physical path that corresponding transactions traverse. These peculiarities of the HW must clearly be carefully taken into account when designing support for programming models. In this paper we study how architectural awareness is key to supporting efficient and streamlined fork/join primitives. We compare hierarchical fork/join operations to \"flat\" ones, where there is no notion of the hierarchical interconnection system, considering two real-world manycores: Intel SCC and STMicro-electronics STHORM.","PeriodicalId":84860,"journal":{"name":"Histoire & mesure","volume":"5 1","pages":"9-16"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Histoire & mesure","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2613908.2613911","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Several recent manycores leverage a hierarchical design, where small-medium numbers of cores are grouped inside clusters and enjoy low-latency, high-bandwidth local communication through fast L1 scratchpad memories. Several clusters can be interconnected through a network-on-chip (NoC), which ensures system scalability but introduces non-uniform memory access (NUMA) effects: the cost to access a specific memory location depends of the physical path that corresponding transactions traverse. These peculiarities of the HW must clearly be carefully taken into account when designing support for programming models. In this paper we study how architectural awareness is key to supporting efficient and streamlined fork/join primitives. We compare hierarchical fork/join operations to "flat" ones, where there is no notion of the hierarchical interconnection system, considering two real-world manycores: Intel SCC and STMicro-electronics STHORM.