SC14: International Conference for High Performance Computing, Networking, Storage and Analysis最新文献_第9页

Managing DRAM Latency Divergence in Irregular GPGPU Applications 管理不规则GPGPU应用程序中的DRAM延迟差异

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.16

Niladrish Chatterjee, Mike O'Connor, G. Loh, N. Jayasena, R. Balasubramonian

引用次数: 80

Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations 油和水可以混合:多面体和基于ast的转换的集成

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.29

J. Shirako, L. Pouchet, Vivek Sarkar

{"title":"Oil and Water Can Mix: An Integration of Polyhedral and AST-Based Transformations","authors":"J. Shirako, L. Pouchet, Vivek Sarkar","doi":"10.1109/SC.2014.29","DOIUrl":"https://doi.org/10.1109/SC.2014.29","url":null,"abstract":"Optimizing compilers targeting modern multi-core machines require complex program restructuring to expose the best combinations of coarse- and fine-grain parallelism and data locality. The polyhedral compilation model has provided significant advancements in the seamless handling of compositions of loop transformations, thereby exposing multiple levels of parallelism and improving data reuse. However, it usually implements abstract optimization objectives, for example \"maximize data reuse\", which often does not deliver best performance, e.g., The complex loop structures generated can be detrimental to short-vector SIMD performance. In addition, several key transformations such as pipeline-parallelism and unroll-and-jam are difficult to express in the polyhedral framework. In this paper, we propose a novel optimization flow that combines polyhedral and syntactic/AST-based transformations. It generates high-performance code that contains regular loops which can be effectively vectorized, while still implementing sufficient parallelism and data reuse. It combines several transformation stages using both polyhedral and AST-based transformations, delivering performance improvements of up to 3× over the PoCC polyhedral compiler on Intel Nehalem and IBM Power7 multicore processors.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125930042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution 46千兆突触OPS/瓦特的实时可扩展皮质计算，加速到解决时间约100倍，减少能量至解决方案约100,000倍

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI: 10.1109/SC.2014.8

A. Cassidy, Rodrigo Alvarez-Icaza, Filipp Akopyan, J. Sawada, J. Arthur, P. Merolla, Pallab Datta, Marc González, B. Taba, Alexander Andreopoulos, A. Amir, Steven K. Esser, J. Kusnitz, R. Appuswamy, C. Haymes, B. Brezzo, R. Moussalli, Ralph Bellofatto, C. Baks, M. Mastro, K. Schleupen, C. E. Cox, K. Inoue, S. Millman, N. Imam, E. McQuinn, Yutaka Nakamura, I. Vo, Chen Guok, Don N. Nguyen, S. Lekuch, S. Asaad, D. Friedman, Bryan L. Jackson, M. Flickner, W. Risk, R. Manohar, D. Modha

{"title":"Real-Time Scalable Cortical Computing at 46 Giga-Synaptic OPS/Watt with ~100× Speedup in Time-to-Solution and ~100,000× Reduction in Energy-to-Solution","authors":"A. Cassidy, Rodrigo Alvarez-Icaza, Filipp Akopyan, J. Sawada, J. Arthur, P. Merolla, Pallab Datta, Marc González, B. Taba, Alexander Andreopoulos, A. Amir, Steven K. Esser, J. Kusnitz, R. Appuswamy, C. Haymes, B. Brezzo, R. Moussalli, Ralph Bellofatto, C. Baks, M. Mastro, K. Schleupen, C. E. Cox, K. Inoue, S. Millman, N. Imam, E. McQuinn, Yutaka Nakamura, I. Vo, Chen Guok, Don N. Nguyen, S. Lekuch, S. Asaad, D. Friedman, Bryan L. Jackson, M. Flickner, W. Risk, R. Manohar, D. Modha","doi":"10.1109/SC.2014.8","DOIUrl":"https://doi.org/10.1109/SC.2014.8","url":null,"abstract":"Drawing on neuroscience, we have developed a parallel, event-driven kernel for neurosynaptic computation, that is efficient with respect to computation, memory, and communication. Building on the previously demonstrated highly optimized software expression of the kernel, here, we demonstrate True North, a co-designed silicon expression of the kernel. True North achieves five orders of magnitude reduction in energy to-solution and two orders of magnitude speedup in time-to solution, when running computer vision applications and complex recurrent neural network simulations. Breaking path with the von Neumann architecture, True North is a 4,096 core, 1 million neuron, and 256 million synapse brain-inspired neurosynaptic processor, that consumes 65mW of power running at real-time and delivers performance of 46 Giga-Synaptic OPS/Watt. We demonstrate seamless tiling of True North chips into arrays, forming a foundation for cortex-like scalability. True North's unprecedented time-to-solution, energy-to-solution, size, scalability, and performance combined with the underlying flexibility of the kernel enable a broad range of cognitive applications.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127047732","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 83

NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing NUMARCK:弹性和检查点的机器学习算法

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-01 DOI: 10.1109/SC.2014.65

Zhengzhang Chen, S. Son, W. Hendrix, Ankit Agrawal, W. Liao, A. Choudhary

{"title":"NUMARCK: Machine Learning Algorithm for Resiliency and Checkpointing","authors":"Zhengzhang Chen, S. Son, W. Hendrix, Ankit Agrawal, W. Liao, A. Choudhary","doi":"10.1109/SC.2014.65","DOIUrl":"https://doi.org/10.1109/SC.2014.65","url":null,"abstract":"Data check pointing is an important fault tolerance technique in High Performance Computing (HPC) systems. As the HPC systems move towards exascale, the storage space and time costs of check pointing threaten to overwhelm not only the simulation but also the post-simulation data analysis. One common practice to address this problem is to apply compression algorithms to reduce the data size. However, traditional lossless compression techniques that look for repeated patterns are ineffective for scientific data in which high-precision data is used and hence common patterns are rare to find. This paper exploits the fact that in many scientific applications, the relative changes in data values from one simulation iteration to the next are not very significantly different from each other. Thus, capturing the distribution of relative changes in data instead of storing the data itself allows us to incorporate the temporal dimension of the data and learn the evolving distribution of the changes. We show that an order of magnitude data reduction becomes achievable within guaranteed user-defined error bounds for each data point. We propose NUMARCK, North western University Machine learning Algorithm for Resiliency and Check pointing, that makes use of the emerging distributions of data changes between consecutive simulation iterations and encodes them into an indexing space that can be concisely represented. We evaluate NUMARCK using two production scientific simulations, FLASH and CMIP5, and demonstrate a superior performance in terms of compression ratio and compression accuracy. More importantly, our algorithm allows users to specify the maximum tolerable error on a per point basis, while compressing the data by an order of magnitude.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127072425","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50