2010 22nd International Symposium on Computer Architecture and High Performance Computing最新文献_第2页

Mixed-Precision Parallel Linear Programming Solver 混合精度并行线性规划求解器

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.14

Mujahed Eleyat, L. Natvig

引用次数: 4

Accelerating Computational Fluid Dynamics on the IBM Blue Gene/P Supercomputer 在IBM蓝色基因/P超级计算机上加速计算流体动力学

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.27

P. Vezolle, Jerry Heyman, Bruce D. D'Amora, G. W. Braudaway, Karen A. Magerlein, J. Magerlein, Y. Fournier

{"title":"Accelerating Computational Fluid Dynamics on the IBM Blue Gene/P Supercomputer","authors":"P. Vezolle, Jerry Heyman, Bruce D. D'Amora, G. W. Braudaway, Karen A. Magerlein, J. Magerlein, Y. Fournier","doi":"10.1109/SBAC-PAD.2010.27","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.27","url":null,"abstract":"Computational Fluid Dynamics (CFD) is an increasingly important application domain for computational scientists. In this paper, we propose and analyze optimizations necessary to run CFD simulations consisting of multi-billion-cell mesh models on large processor systems. Our investigation leverages the general industrial Navier-Stokes open-source CFD application, Code_Saturne, developed by Electricité de France (EDF). Our work considers emerging processor features such as many-core, Symmetric Multi-threading (SMT), Single Instruction Multiple Data (SIMD), Transactional Memory, and Thread Level Speculation. Initially, we have targeted per-node performance improvements by reconstructing the code and data layouts to optimally use multiple threads. We present a general loop transformation that will enable the compiler to generate OpenMP threads effectively with minimal impact to overall code structure. A renumbering scheme for mesh faces is proposed to enhance thread-level parallelism and generally improve data locality. Performance results on IBM Blue Gene/P supercomputer and Intel Xeon Westmere cluster are included.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"93 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127483008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

On the Worst Case of Scheduling with Task Replication on Computational Grids 计算网格上带任务复制调度的最坏情况研究

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.24

E. C. Xavier, Robson R. S. Peixoto

{"title":"On the Worst Case of Scheduling with Task Replication on Computational Grids","authors":"E. C. Xavier, Robson R. S. Peixoto","doi":"10.1109/SBAC-PAD.2010.24","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.24","url":null,"abstract":"We study the problem of scheduling tasks in a computational grid. We give analytical results for Work queue with Replication (WQR) based algorithms. There are several works presenting simulation results for scheduling algorithms for computational grid, but few provide analytical evidence of the quality of the solution of these algorithms. In this paper we show that under the TPCC metric cite{FujimotoH03} there is an optimal algorithm if the machines speed are predictable and tasks have the same length. If machines speed are not predictable we show an approximation result for the WQRxx algorithm and show that the result is tight. When tasks have different lengths the problem of minimizing the make span does not admit an approximation algorithm, even when machines speed are predictable. On the other hand, we show that the WQR based algorithm is a $m$-approximation when minimizing the TPCC in the unpredictable case, and this result is tight. To finish we show how to add replication to any scheduling algorithm using a simple interface and present computational simulations comparing the quality of the solutions of some well know algorithms with the addition of replication.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"49 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133038758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Dynamic Teams in OpenMP OpenMP中的动态团队

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.36

J. Schönherr, Jan Richling, Hans-Ulrich Heiß

引用次数: 6

Impact of I/O Coordination on a NFS-Based Parallel File System with Dynamic Reconfiguration 动态重构下基于nfs的并行文件系统I/O协调的影响

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.32

Rodrigo Kassick, F. Boito, P. Navaux

{"title":"Impact of I/O Coordination on a NFS-Based Parallel File System with Dynamic Reconfiguration","authors":"Rodrigo Kassick, F. Boito, P. Navaux","doi":"10.1109/SBAC-PAD.2010.32","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.32","url":null,"abstract":"The large gap between processing and I/O speed makes the storage infrastructure of a cluster a great bottleneck for HPC applications. Parallel File Systems propose a solution to this issue by distributing data onto several servers, dividing the load of I/O operations and increasing the available bandwidth. However, most parallel file systems use a fixed number of I/O servers defined during initialization and do not support addition of new resources as applications’ demands grow. With the execution of different applications at the same time, the concurrent access to these resources can impact the performance and aggravate the existing bottleneck. The dNFSp File System proposes a reconfiguration mechanism that aims to include new I/O resources as application’s demands grow. These resources are standard cluster nodes and are dedicated to a single application. This paper presents a study of the I/O performance of this reconfiguration mechanism under two circunstances: the use of several independent processes on a multi-core system or of a single centralized I/O process that coordinates the requests from all instances on a node. We show that the use of coordination can improve performance of applications with regular intervals between I/O phases. For applications with no such intervals, on the other hand, uncoordinated I/O presents better performance.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124232056","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Control Scheme for a CGRA CGRA的控制方案

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.12

M. A. Shami, A. Hemani

引用次数: 17

A Cache Replacement Policy Using Adaptive Insertion and Re-reference Prediction 一种基于自适应插入和重引用预测的缓存替换策略

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.21

Xi Zhang, Chongmin Li, Haixia Wang, Dongsheng Wang

{"title":"A Cache Replacement Policy Using Adaptive Insertion and Re-reference Prediction","authors":"Xi Zhang, Chongmin Li, Haixia Wang, Dongsheng Wang","doi":"10.1109/SBAC-PAD.2010.21","DOIUrl":"https://doi.org/10.1109/SBAC-PAD.2010.21","url":null,"abstract":"Previous research shows that LRU replacement policy is not efficient when applications exhibit a distant re-reference interval. Recently proposed RRIP policy improves performance for such workloads. However, RRIP lacks of access recency information, which may confuse the replacement policy to make accurate prediction. Consequently, RRIP is not robust for recency-friendly workloads. This paper proposes an Adaptive Insertion and Re-reference Prediction (AI-RRP) policy which evicts data based on both re-reference prediction value and the access recency information. To make the replacement policy more adaptive across different workloads and different phases during execution, Dynamic AI-RRP (DAI-RRP) is proposed which adjusts the insertion position and prediction value for different access patterns. Simulation results show DAI-RRP reduces CPI over LRU and Dynamic RRIP by an average of 8.3% and 4.1% respectively on a single-core processor with a 1MB 16-way set last-level cache (LLC). Evaluations on quad-core CMP with a 4MB shared LLC show that DAI-RRP outperforms LRU and Dynamic RRIP (DRRIP) on the weighted speedup metric by an average of 13.2% and 26.7% respectively. Furthermore, compred to LRU, DAI-RRP requires similar hardware, or even less hardware for high-associativity cache.","PeriodicalId":432670,"journal":{"name":"2010 22nd International Symposium on Computer Architecture and High Performance Computing","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2010-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121517506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 11

The Dynamic Block Remapping Cache 动态块重新映射缓存

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.39

Felipe Pedroni, A. D. Souza, C. Badue

引用次数: 0

Sharing Resources for Performance and Energy Optimization of Concurrent Streaming Applications 共享资源以优化并发流应用程序的性能和能量

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.19

A. Benoit, Paul Renaud-Goud, Yves Robert

引用次数: 35

Distributed Evidence Propagation in Junction Trees 连接树中的分布式证据传播

2010 22nd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2010-10-27 DOI: 10.1109/SBAC-PAD.2010.25

Yinglong Xia, V. Prasanna

引用次数: 1