内存分析框架的实用近数据处理

2015 International Conference on Parallel Architecture and Compilation (PACT) Pub Date : 2015-10-18 DOI:10.1109/PACT.2015.22

Mingyu Gao, Grant Ayers, C. Kozyrakis

{"title":"内存分析框架的实用近数据处理","authors":"Mingyu Gao, Grant Ayers, C. Kozyrakis","doi":"10.1109/PACT.2015.22","DOIUrl":null,"url":null,"abstract":"The end of Dennard scaling has made all systemsenergy-constrained. For data-intensive applications with limitedtemporal locality, the major energy bottleneck is data movementbetween processor chips and main memory modules. For such workloads, the best way to optimize energy is to place processing near the datain main memory. Advances in 3D integrationprovide an opportunity to implement near-data processing (NDP) withoutthe technology problems that similar efforts had in the past. This paper develops the hardware and software of an NDP architecturefor in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks. We develop simple but scalablehardware support for coherence, communication, and synchronization, anda runtime system that is sufficient to support analytics frameworks withcomplex data patterns while hiding all thedetails of the NDP hardware. Our NDP architecture provides up to 16x performance and energy advantageover conventional approaches, and 2.5x over recently-proposed NDP systems. We also investigate the balance between processing and memory throughput, as well as the scalability and physical and logical organization of the memory system. Finally, we show that it is critical to optimize software frameworksfor spatial locality as it leads to 2.9x efficiency improvements for NDP.","PeriodicalId":385398,"journal":{"name":"2015 International Conference on Parallel Architecture and Compilation (PACT)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"237","resultStr":"{\"title\":\"Practical Near-Data Processing for In-Memory Analytics Frameworks\",\"authors\":\"Mingyu Gao, Grant Ayers, C. Kozyrakis\",\"doi\":\"10.1109/PACT.2015.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The end of Dennard scaling has made all systemsenergy-constrained. For data-intensive applications with limitedtemporal locality, the major energy bottleneck is data movementbetween processor chips and main memory modules. For such workloads, the best way to optimize energy is to place processing near the datain main memory. Advances in 3D integrationprovide an opportunity to implement near-data processing (NDP) withoutthe technology problems that similar efforts had in the past. This paper develops the hardware and software of an NDP architecturefor in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks. We develop simple but scalablehardware support for coherence, communication, and synchronization, anda runtime system that is sufficient to support analytics frameworks withcomplex data patterns while hiding all thedetails of the NDP hardware. Our NDP architecture provides up to 16x performance and energy advantageover conventional approaches, and 2.5x over recently-proposed NDP systems. We also investigate the balance between processing and memory throughput, as well as the scalability and physical and logical organization of the memory system. Finally, we show that it is critical to optimize software frameworksfor spatial locality as it leads to 2.9x efficiency improvements for NDP.\",\"PeriodicalId\":385398,\"journal\":{\"name\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"237\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Conference on Parallel Architecture and Compilation (PACT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PACT.2015.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Conference on Parallel Architecture and Compilation (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PACT.2015.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 237

摘要

登纳德缩放的终结使得所有系统的能量都受到限制。对于时间局部性有限的数据密集型应用，主要的能量瓶颈是处理器芯片和主存储模块之间的数据移动。对于这样的工作负载，优化能量的最佳方法是将处理放在数据主内存附近。3D集成的进步为实现近数据处理(NDP)提供了机会，而没有过去类似努力所存在的技术问题。本文开发了内存分析框架的NDP架构的硬件和软件，包括MapReduce，图形处理和深度神经网络。我们开发了简单但可扩展的硬件支持，用于一致性，通信和同步，以及运行时系统，足以支持具有复杂数据模式的分析框架，同时隐藏了NDP硬件的所有细节。我们的NDP架构比传统方法提供高达16倍的性能和能源优势，比最近提出的NDP系统提供2.5倍。我们还研究了处理和内存吞吐量之间的平衡，以及内存系统的可伸缩性和物理和逻辑组织。最后，我们表明，优化空间局部性的软件框架至关重要，因为它会导致NDP的效率提高2.9倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Practical Near-Data Processing for In-Memory Analytics Frameworks

The end of Dennard scaling has made all systemsenergy-constrained. For data-intensive applications with limitedtemporal locality, the major energy bottleneck is data movementbetween processor chips and main memory modules. For such workloads, the best way to optimize energy is to place processing near the datain main memory. Advances in 3D integrationprovide an opportunity to implement near-data processing (NDP) withoutthe technology problems that similar efforts had in the past. This paper develops the hardware and software of an NDP architecturefor in-memory analytics frameworks, including MapReduce, graphprocessing, and deep neural networks. We develop simple but scalablehardware support for coherence, communication, and synchronization, anda runtime system that is sufficient to support analytics frameworks withcomplex data patterns while hiding all thedetails of the NDP hardware. Our NDP architecture provides up to 16x performance and energy advantageover conventional approaches, and 2.5x over recently-proposed NDP systems. We also investigate the balance between processing and memory throughput, as well as the scalability and physical and logical organization of the memory system. Finally, we show that it is critical to optimize software frameworksfor spatial locality as it leads to 2.9x efficiency improvements for NDP.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 International Conference on Parallel Architecture and Compilation (PACT)

自引率

0.00%

发文量