SC14: International Conference for High Performance Computing, Networking, Storage and Analysis最新文献

筛选
英文 中文
Managing DRAM Latency Divergence in Irregular GPGPU Applications 管理不规则GPGPU应用程序中的DRAM延迟差异
Niladrish Chatterjee, Mike O'Connor, G. Loh, N. Jayasena, R. Balasubramonian
{"title":"Managing DRAM Latency Divergence in Irregular GPGPU Applications","authors":"Niladrish Chatterjee, Mike O'Connor, G. Loh, N. Jayasena, R. Balasubramonian","doi":"10.1109/SC.2014.16","DOIUrl":"https://doi.org/10.1109/SC.2014.16","url":null,"abstract":"Memory controllers in modern GPUs aggressively reorder requests for high bandwidth usage, often interleaving requests from different warps. This leads to high variance in the latency of different requests issued by the threads of a warp. Since a warp in a SIMT architecture can proceed only when all of its memory requests are returned by memory, such latency divergence causes significant slowdown when running irregular GPGPU applications. To solve this issue, we propose memory scheduling mechanisms that avoid inter-warp interference in the DRAM system to reduce the average memory stall latency experienced by warps. We further reduce latency divergence through mechanisms that coordinate scheduling decisions across multiple independent memory channels. Finally we show that carefully orchestrating the memory scheduling policy can achieve low average latency for warps, without compromising bandwidth utilization. Our combined scheme yields a 10.1% performance improvement for irregular GPGPU workloads relative to a throughput-optimized GPU memory controller.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126368661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 80
Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations 油和水可以混合:多面体和基于ast的转换的集成
J. Shirako, L. Pouchet, Vivek Sarkar
{"title":"Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations","authors":"J. Shirako, L. Pouchet, Vivek Sarkar","doi":"10.1109/SC.2014.29","DOIUrl":"https://doi.org/10.1109/SC.2014.29","url":null,"abstract":"Optimizing compilers targeting modern multi-core machines require complex program restructuring to expose the best combinations of coarse- and fine-grain parallelism and data locality. The polyhedral compilation model has provided significant advancements in the seamless handling of compositions of loop transformations, thereby exposing multiple levels of parallelism and improving data reuse. However, it usually implements abstract optimization objectives, for example \"maximize data reuse\", which often does not deliver best performance, e.g., The complex loop structures generated can be detrimental to short-vector SIMD performance. In addition, several key transformations such as pipeline-parallelism and unroll-and-jam are difficult to express in the polyhedral framework. In this paper, we propose a novel optimization flow that combines polyhedral and syntactic/AST-based transformations. It generates high-performance code that contains regular loops which can be effectively vectorized, while still implementing sufficient parallelism and data reuse. It combines several transformation stages using both polyhedral and AST-based transformations, delivering performance improvements of up to 3× over the PoCC polyhedral compiler on Intel Nehalem and IBM Power7 multicore processors.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125930042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 30
Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution 46千兆突触OPS/瓦特的实时可扩展皮质计算,加速到解决时间约100倍,减少能量至解决方案约100,000倍
A. Cassidy, Rodrigo Alvarez-Icaza, Filipp Akopyan, J. Sawada, J. Arthur, P. Merolla, Pallab Datta, Marc González, B. Taba, Alexander Andreopoulos, A. Amir, Steven K. Esser, J. Kusnitz, R. Appuswamy, C. Haymes, B. Brezzo, R. Moussalli, Ralph Bellofatto, C. Baks, M. Mastro, K. Schleupen, C. E. Cox, K. Inoue, S. Millman, N. Imam, E. McQuinn, Yutaka Nakamura, I. Vo, Chen Guok, Don N. Nguyen, S. Lekuch, S. Asaad, D. Friedman, Bryan L. Jackson, M. Flickner, W. Risk, R. Manohar, D. Modha
{"title":"Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution","authors":"A. Cassidy, Rodrigo Alvarez-Icaza, Filipp Akopyan, J. Sawada, J. Arthur, P. Merolla, Pallab Datta, Marc González, B. Taba, Alexander Andreopoulos, A. Amir, Steven K. Esser, J. Kusnitz, R. Appuswamy, C. Haymes, B. Brezzo, R. Moussalli, Ralph Bellofatto, C. Baks, M. Mastro, K. Schleupen, C. E. Cox, K. Inoue, S. Millman, N. Imam, E. McQuinn, Yutaka Nakamura, I. Vo, Chen Guok, Don N. Nguyen, S. Lekuch, S. Asaad, D. Friedman, Bryan L. Jackson, M. Flickner, W. Risk, R. Manohar, D. Modha","doi":"10.1109/SC.2014.8","DOIUrl":"https://doi.org/10.1109/SC.2014.8","url":null,"abstract":"Drawing on neuroscience, we have developed a parallel, event-driven kernel for neurosynaptic computation, that is efficient with respect to computation, memory, and communication. Building on the previously demonstrated highly optimized software expression of the kernel, here, we demonstrate True North, a co-designed silicon expression of the kernel. True North achieves five orders of magnitude reduction in energy to-solution and two orders of magnitude speedup in time-to solution, when running computer vision applications and complex recurrent neural network simulations. Breaking path with the von Neumann architecture, True North is a 4,096 core, 1 million neuron, and 256 million synapse brain-inspired neurosynaptic processor, that consumes 65mW of power running at real-time and delivers performance of 46 Giga-Synaptic OPS/Watt. We demonstrate seamless tiling of True North chips into arrays, forming a foundation for cortex-like scalability. True North's unprecedented time-to-solution, energy-to-solution, size, scalability, and performance combined with the underlying flexibility of the kernel enable a broad range of cognitive applications.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127047732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 83
NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing NUMARCK:弹性和检查点的机器学习算法
Zhengzhang Chen, S. Son, W. Hendrix, Ankit Agrawal, W. Liao, A. Choudhary
{"title":"NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing","authors":"Zhengzhang Chen, S. Son, W. Hendrix, Ankit Agrawal, W. Liao, A. Choudhary","doi":"10.1109/SC.2014.65","DOIUrl":"https://doi.org/10.1109/SC.2014.65","url":null,"abstract":"Data check pointing is an important fault tolerance technique in High Performance Computing (HPC) systems. As the HPC systems move towards exascale, the storage space and time costs of check pointing threaten to overwhelm not only the simulation but also the post-simulation data analysis. One common practice to address this problem is to apply compression algorithms to reduce the data size. However, traditional lossless compression techniques that look for repeated patterns are ineffective for scientific data in which high-precision data is used and hence common patterns are rare to find. This paper exploits the fact that in many scientific applications, the relative changes in data values from one simulation iteration to the next are not very significantly different from each other. Thus, capturing the distribution of relative changes in data instead of storing the data itself allows us to incorporate the temporal dimension of the data and learn the evolving distribution of the changes. We show that an order of magnitude data reduction becomes achievable within guaranteed user-defined error bounds for each data point. We propose NUMARCK, North western University Machine learning Algorithm for Resiliency and Check pointing, that makes use of the emerging distributions of data changes between consecutive simulation iterations and encodes them into an indexing space that can be concisely represented. We evaluate NUMARCK using two production scientific simulations, FLASH and CMIP5, and demonstrate a superior performance in terms of compression ratio and compression accuracy. More importantly, our algorithm allows users to specify the maximum tolerable error on a per point basis, while compressing the data by an order of magnitude.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127072425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 50
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信