Zehra Sura, A. Jacob, Tong Chen, Bryan S. Rosenburg, Olivier Sallenave, C. Bertolli, S. Antão, J. Brunheroto, Yoonho Park, K. O'Brien, R. Nair
{"title":"内存处理系统中的数据访问优化","authors":"Zehra Sura, A. Jacob, Tong Chen, Bryan S. Rosenburg, Olivier Sallenave, C. Bertolli, S. Antão, J. Brunheroto, Yoonho Park, K. O'Brien, R. Nair","doi":"10.1145/2742854.2742863","DOIUrl":null,"url":null,"abstract":"The Active Memory Cube (AMC) system is a novel heterogeneous computing system concept designed to provide high performance and power-efficiency across a range of applications. The AMC architecture includes general-purpose host processors and specially designed in-memory processors (processing lanes) that would be integrated in a logic layer within 3D DRAM memory. The processing lanes have large vector register files but no power-hungry caches or local memory buffers. Performance depends on how well the resulting higher effective memory latency within the AMC can be managed. In this paper, we describe a combination of programming language features, compiler techniques, operating system interfaces, and hardware design that can effectively hide memory latency for the processing lanes in an AMC system. We present experimental data to show how this approach improves the performance of a set of representative benchmarks important in high performance computing applications. As a result, we are able to achieve high performance together with power efficiency using the AMC architecture.","PeriodicalId":417279,"journal":{"name":"Proceedings of the 12th ACM International Conference on Computing Frontiers","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"67","resultStr":"{\"title\":\"Data access optimization in a processing-in-memory system\",\"authors\":\"Zehra Sura, A. Jacob, Tong Chen, Bryan S. Rosenburg, Olivier Sallenave, C. Bertolli, S. Antão, J. Brunheroto, Yoonho Park, K. O'Brien, R. Nair\",\"doi\":\"10.1145/2742854.2742863\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Active Memory Cube (AMC) system is a novel heterogeneous computing system concept designed to provide high performance and power-efficiency across a range of applications. The AMC architecture includes general-purpose host processors and specially designed in-memory processors (processing lanes) that would be integrated in a logic layer within 3D DRAM memory. The processing lanes have large vector register files but no power-hungry caches or local memory buffers. Performance depends on how well the resulting higher effective memory latency within the AMC can be managed. In this paper, we describe a combination of programming language features, compiler techniques, operating system interfaces, and hardware design that can effectively hide memory latency for the processing lanes in an AMC system. We present experimental data to show how this approach improves the performance of a set of representative benchmarks important in high performance computing applications. As a result, we are able to achieve high performance together with power efficiency using the AMC architecture.\",\"PeriodicalId\":417279,\"journal\":{\"name\":\"Proceedings of the 12th ACM International Conference on Computing Frontiers\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"67\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2742854.2742863\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2742854.2742863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 67
摘要
Active Memory Cube (AMC)系统是一种新颖的异构计算系统概念,旨在为各种应用提供高性能和高能效。AMC架构包括通用主机处理器和专门设计的内存处理器(处理通道),这些处理器将集成在3D DRAM内存的逻辑层中。处理通道有很大的矢量寄存器文件,但没有耗电的缓存或本地内存缓冲区。性能取决于如何管理AMC中产生的更高的有效内存延迟。在本文中,我们描述了一种编程语言特性、编译器技术、操作系统接口和硬件设计的组合,可以有效地隐藏AMC系统中处理通道的内存延迟。我们提供了实验数据来展示这种方法如何提高高性能计算应用程序中重要的一组代表性基准的性能。因此,我们能够使用AMC架构实现高性能和节能。
Data access optimization in a processing-in-memory system
The Active Memory Cube (AMC) system is a novel heterogeneous computing system concept designed to provide high performance and power-efficiency across a range of applications. The AMC architecture includes general-purpose host processors and specially designed in-memory processors (processing lanes) that would be integrated in a logic layer within 3D DRAM memory. The processing lanes have large vector register files but no power-hungry caches or local memory buffers. Performance depends on how well the resulting higher effective memory latency within the AMC can be managed. In this paper, we describe a combination of programming language features, compiler techniques, operating system interfaces, and hardware design that can effectively hide memory latency for the processing lanes in an AMC system. We present experimental data to show how this approach improves the performance of a set of representative benchmarks important in high performance computing applications. As a result, we are able to achieve high performance together with power efficiency using the AMC architecture.