优化Cell BE异构芯片多处理器的数据共享和地址转换

2008 IEEE International Conference on Computer Design Pub Date : 2008-10-01 DOI:10.1109/ICCD.2008.4751904

M. Gschwind

{"title":"优化Cell BE异构芯片多处理器的数据共享和地址转换","authors":"M. Gschwind","doi":"10.1109/ICCD.2008.4751904","DOIUrl":null,"url":null,"abstract":"Heterogeneous Chip Multiprocessors (HMPs), such as the Cell Broadband Engine, offer a new design optimization opportunity by allowing designers to provide accelerators for application specific domains. Data sharing between CPUs and accelerators, and memory access mechanisms and protocols are crucial decisions in the design of an HMP. In this article, we analyze the choices between hardware and software managed coherence between CPU and accelerators for DMA-based data sharing, and find that hardware-coherent DMA shows a performance benefit of up to 3x, even for simple workloads.We explore memory address translation architecture choices for DMA-based data sharing. In multiprogramming environments, address translation is commonly used to separate processes. For efficiency, direct access to system memory requires address translation capabilities in the accelerator. We find that hardware managed address translation shows a performance benefit of up to 5x, even for simple workloads, by avoiding the costs of accelerator/CPU communication and supervisor management of the translation context and the introduction of a serial bottleneck on the CPU.","PeriodicalId":345501,"journal":{"name":"2008 IEEE International Conference on Computer Design","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Optimizing data sharing and address translation for the Cell BE Heterogeneous Chip Multiprocessor\",\"authors\":\"M. Gschwind\",\"doi\":\"10.1109/ICCD.2008.4751904\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous Chip Multiprocessors (HMPs), such as the Cell Broadband Engine, offer a new design optimization opportunity by allowing designers to provide accelerators for application specific domains. Data sharing between CPUs and accelerators, and memory access mechanisms and protocols are crucial decisions in the design of an HMP. In this article, we analyze the choices between hardware and software managed coherence between CPU and accelerators for DMA-based data sharing, and find that hardware-coherent DMA shows a performance benefit of up to 3x, even for simple workloads.We explore memory address translation architecture choices for DMA-based data sharing. In multiprogramming environments, address translation is commonly used to separate processes. For efficiency, direct access to system memory requires address translation capabilities in the accelerator. We find that hardware managed address translation shows a performance benefit of up to 5x, even for simple workloads, by avoiding the costs of accelerator/CPU communication and supervisor management of the translation context and the introduction of a serial bottleneck on the CPU.\",\"PeriodicalId\":345501,\"journal\":{\"name\":\"2008 IEEE International Conference on Computer Design\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Conference on Computer Design\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2008.4751904\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Conference on Computer Design","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2008.4751904","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

异构芯片多处理器(hmp)，如Cell宽带引擎，通过允许设计人员为特定应用领域提供加速器，提供了新的设计优化机会。cpu和加速器之间的数据共享以及内存访问机制和协议是HMP设计中的关键决策。在本文中，我们分析了基于DMA的数据共享的CPU和加速器之间的硬件和软件管理一致性的选择，并发现硬件一致的DMA显示了高达3倍的性能优势，即使对于简单的工作负载也是如此。我们探索了基于dma的数据共享的内存地址转换架构选择。在多道程序设计环境中，地址转换通常用于分离进程。为了提高效率，直接访问系统内存需要加速器中的地址转换功能。我们发现，硬件管理的地址转换显示了高达5倍的性能优势，即使对于简单的工作负载，通过避免加速器/CPU通信和翻译上下文的主管管理的成本，以及在CPU上引入串行瓶颈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimizing data sharing and address translation for the Cell BE Heterogeneous Chip Multiprocessor

Heterogeneous Chip Multiprocessors (HMPs), such as the Cell Broadband Engine, offer a new design optimization opportunity by allowing designers to provide accelerators for application specific domains. Data sharing between CPUs and accelerators, and memory access mechanisms and protocols are crucial decisions in the design of an HMP. In this article, we analyze the choices between hardware and software managed coherence between CPU and accelerators for DMA-based data sharing, and find that hardware-coherent DMA shows a performance benefit of up to 3x, even for simple workloads.We explore memory address translation architecture choices for DMA-based data sharing. In multiprogramming environments, address translation is commonly used to separate processes. For efficiency, direct access to system memory requires address translation capabilities in the accelerator. We find that hardware managed address translation shows a performance benefit of up to 5x, even for simple workloads, by avoiding the costs of accelerator/CPU communication and supervisor management of the translation context and the introduction of a serial bottleneck on the CPU.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Conference on Computer Design

自引率

0.00%

发文量