2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)最新文献_第2页

A distributed dynamic load balancer for iterative applications 用于迭代应用程序的分布式动态负载平衡器

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503284

Harshitha Menon, L. Kalé

引用次数: 55

A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes 一种可扩展的、高效的非结构化网格模板计算评估方案

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503214

James King, R. Kirby

{"title":"A scalable, efficient scheme for evaluation of stencil computations over unstructured meshes","authors":"James King, R. Kirby","doi":"10.1145/2503210.2503214","DOIUrl":"https://doi.org/10.1145/2503210.2503214","url":null,"abstract":"Stencil computations are a common class of operations that appear in many computational scientific and engineering applications. Stencil computations often benefit from compiletime analysis, exploiting data-locality, and parallelism. Post-processing of discontinuous Galerkin (dG) simulation solutions with B-spline kernels is an example of a numerical method which requires evaluating computationally intensive stencil operations over a mesh. Previous work on stencil computations has focused on structured meshes, while giving little attention to unstructured meshes. Performing stencil operations over an unstructured mesh requires sampling of heterogeneous elements which often leads to inefficient memory access patterns and limits data locality/reuse. In this paper, we present an efficient method for performing stencil computations over unstructured meshes which increases data-locality and cache efficiency, and a scalable approach for stencil tiling and concurrent execution. We provide experimental results in the context of post-processing of dG solutions that demonstrate the effectiveness of our approach.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128117929","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Low-power, low-storage-overhead chipkill correct via multi-line error correction 低功耗，低存储开销的芯片kill通过多行纠错纠正

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503243

Xun Jian, Henry Duwe, J. Sartori, Vilas Sridharan, Rakesh Kumar

引用次数: 50

Coordinated energy management in heterogeneous processors 异构处理器中的协调能量管理

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503227

Indrani Paul, Vignesh T. Ravi, Srilatha Manne, Manish Arora, S. Yalamanchili

{"title":"Coordinated energy management in heterogeneous processors","authors":"Indrani Paul, Vignesh T. Ravi, Srilatha Manne, Manish Arora, S. Yalamanchili","doi":"10.1145/2503210.2503227","DOIUrl":"https://doi.org/10.1145/2503210.2503227","url":null,"abstract":"This paper examines energy management in a heterogeneous processor consisting of an integrated CPU-GPU for high-performance computing (HPC) applications. Energy management for HPC applications is challenged by their uncompromising performance requirements and complicated by the need for coordinating energy management across distinct core types - a new and less understood problem. We examine the intra-node CPU-GPU frequency sensitivity of HPC applications on tightly coupled CPU-GPU architectures as the first step in understanding power and performance optimization for a heterogeneous multi-node HPC system. The insights from this analysis form the basis of a coordinated energy management scheme, called DynaCo, for integrated CPU-GPU architectures. We implement DynaCo on a modern heterogeneous processor and compare its performance to a state-of-the-art power- and performance-management algorithm. DynaCo improves measured average energy-delay squared (ED^2) product by up to 30% with less than 2% average performance loss across several exascale and other HPC workloads.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133426553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 38

SIDR: Structure-aware intelligent data routing in hadoop SIDR: hadoop中的结构感知智能数据路由

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503241

Joe B. Buck, Noah Watkins, Greg Levin, A. Crume, Kleoni Ioannidou, S. Brandt, C. Maltzahn, N. Polyzotis, Aaron Torres

引用次数: 3

Integrating dynamic pricing of electricity into energy aware scheduling for HPC systems 将电力动态定价集成到高性能计算系统的能源感知调度中

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503264

Xu Yang, Zhou Zhou, Sean Wallace, Z. Lan, Wei Tang, S. Coghlan, M. Papka

引用次数: 86

Algorithms for high-throughput disk-to-disk sorting 高吞吐量磁盘到磁盘排序算法

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503259

H. Sundar, D. Malhotra, K. Schulz

引用次数: 8

Using automated performance modeling to find scalability bugs in complex codes 使用自动性能建模来查找复杂代码中的可伸缩性错误

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503277

A. Calotoiu, T. Hoefler, Marius Poke, F. Wolf

引用次数: 137

GoldRush: Resource efficient in situ scientific data analytics using fine-grained interference aware execution golddrush:资源高效的现场科学数据分析，使用细粒度干扰感知执行

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503279

F. Zheng, Hongfeng Yu, Can Hantas, M. Wolf, G. Eisenhauer, K. Schwan, H. Abbasi, S. Klasky

引用次数: 82

Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors 采用低通信算法和Intel®Xeon Phi™协处理器的万亿级1D FFT

2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC) Pub Date : 2013-11-17 DOI: 10.1145/2503210.2503242

Jongsoo Park, Ganesh Bikshandi, K. Vaidyanathan, P. T. P. Tang, P. Dubey, Daehyun Kim

{"title":"Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors","authors":"Jongsoo Park, Ganesh Bikshandi, K. Vaidyanathan, P. T. P. Tang, P. Dubey, Daehyun Kim","doi":"10.1145/2503210.2503242","DOIUrl":"https://doi.org/10.1145/2503210.2503242","url":null,"abstract":"This paper demonstrates the first tera-scale performance of Intel® Xeon Phi™ coprocessors on 1D FFT computations. Applying a disciplined performance programming methodology of sound algorithm choice, valid performance model, and well-executed optimizations, we break the tera-flop mark on a mere 64 nodes of Xeon Phi and reach 6.7 TFLOPS with 512 nodes, which is 1.5× than achievable on a same number of Intel® Xeon® nodes. It is a challenge to fully utilize the compute capability presented by many-core wide-vector processors for bandwidth-bound FFT computation. We leverage a new algorithm, Segment-of-Interest FFT, with low inter-node communication cost, and aggressively optimize data movements in node-local computations, exploiting caches. Our coordination of low communication algorithm and massively parallel architecture for scalable performance is not limited to running FFT on Xeon Phi; it can serve as a reference for other bandwidth-bound computations and for emerging HPC systems that are increasingly communication limited.","PeriodicalId":371074,"journal":{"name":"2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-11-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116624960","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 33