Data marshaling for multi-core architectures

Proceedings of the 37th annual international symposium on Computer architecture Pub Date : 2010-06-19 DOI:10.1145/1815961.1816020

M. A. Suleman, O. Mutlu, José A. Joao, Khubaib, Y. Patt

{"title":"Data marshaling for multi-core architectures","authors":"M. A. Suleman, O. Mutlu, José A. Joao, Khubaib, Y. Patt","doi":"10.1145/1815961.1816020","DOIUrl":null,"url":null,"abstract":"Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to best run that segment, can improve performance and save power. However, SE's benefit is limited because most segments access inter-segment data, i.e., data generated by the previous segment. When consecutive segments run on different cores, accesses to inter-segment data incur cache misses, thereby reducing performance. This paper proposes Data Marshaling (DM), a new technique to eliminate cache misses to inter-segment data. DM uses profiling to identify instructions that generate inter-segment data, and adds only 96 bytes/core of storage overhead. We show that DM significantly improves the performance of two promising Staged Execution models, Accelerated Critical Sections and producer-consumer pipeline parallelism, on both homogeneous and heterogeneous multi-core systems. In both models, DM can achieve almost all of the potential of ideally eliminating cache misses to inter-segment data. DM's performance benefit increases with the number of cores.","PeriodicalId":132033,"journal":{"name":"Proceedings of the 37th annual international symposium on Computer architecture","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th annual international symposium on Computer architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1815961.1816020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 37

Abstract

Previous research has shown that Staged Execution (SE), i.e., dividing a program into segments and executing each segment at the core that has the data and/or functionality to best run that segment, can improve performance and save power. However, SE's benefit is limited because most segments access inter-segment data, i.e., data generated by the previous segment. When consecutive segments run on different cores, accesses to inter-segment data incur cache misses, thereby reducing performance. This paper proposes Data Marshaling (DM), a new technique to eliminate cache misses to inter-segment data. DM uses profiling to identify instructions that generate inter-segment data, and adds only 96 bytes/core of storage overhead. We show that DM significantly improves the performance of two promising Staged Execution models, Accelerated Critical Sections and producer-consumer pipeline parallelism, on both homogeneous and heterogeneous multi-core systems. In both models, DM can achieve almost all of the potential of ideally eliminating cache misses to inter-segment data. DM's performance benefit increases with the number of cores.

查看原文本刊更多论文

多核架构的数据封送处理

先前的研究表明，分阶段执行(SE)，即将程序划分为多个段，并在具有最佳运行该段的数据和/或功能的核心部分执行每个段，可以提高性能并节省电力。然而，SE的好处是有限的，因为大多数段访问的是段间数据，即前一个段生成的数据。当连续的段在不同的核上运行时，访问段间数据会导致缓存丢失，从而降低性能。本文提出了一种消除段间数据缓存丢失的新技术——数据封送(DM)。DM使用分析来识别生成段间数据的指令，并且只增加96字节/核的存储开销。我们表明，DM显著提高了两个有前途的阶段执行模型的性能，加速临界段和生产者-消费者管道并行，在同质和异构多核系统上。在这两种模型中，DM几乎可以实现消除段间数据缓存丢失的所有潜力。DM的性能优势随着内核数量的增加而增加。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 37th annual international symposium on Computer architecture

自引率

0.00%

发文量