2015 IEEE International Conference on Cluster Computing最新文献_第6页

GRAPH/Z: A Key-Value Store Based Scalable Graph Processing System 基于键值存储的可扩展图形处理系统

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.90

Tonglin Li, Chaoqi Ma, Jiabao Li, Xiaobing Zhou, Ke Wang, Dongfang Zhao, Iman Sadooghi, I. Raicu

引用次数: 23

Highly Scalable Parallel Search-Tree Algorithms: The Virtual Topology Approach 高度可扩展并行搜索树算法:虚拟拓扑方法

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.91

F. Abu-Khzam, A. E. Mouawad, Karim A. Jahed

引用次数: 0

Collective I/O Tuning Using Analytical and Machine Learning Models 使用分析和机器学习模型的集体I/O调优

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.29

Florin Isaila, Prasanna Balaprakash, Stefan M. Wild, D. Kimpe, R. Latham, R. Ross, P. Hovland

引用次数: 26

Accelerating Laue Depth Reconstruction Algorithm with CUDA 基于CUDA的加速Laue深度重建算法

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.78

Ke Yue, N. Schwarz, J. Tischler

引用次数: 1

Push Me Pull You: Integrating Opposing Data Transport Modes for Efficient HPC Application Monitoring 推我拉你:整合对立的数据传输模式为高效的HPC应用程序监控

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.118

O. Aaziz, J. Cook, Hadi Sharifi

引用次数: 4

Energy-Aware Job Management Approaches for Workflow in Cloud 云环境下工作流的能量感知作业管理方法

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.85

M. Khaleel, Mengxia Zhu

引用次数: 5

OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran OpenSHMEM作为PGAS模型的可移植通信层:以Coarray Fortran为例

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.66

N. Namashivayam, Deepak Eachempati, Dounia Khaldi, B. Chapman

{"title":"OpenSHMEM as a Portable Communication Layer for PGAS Models: A Case Study with Coarray Fortran","authors":"N. Namashivayam, Deepak Eachempati, Dounia Khaldi, B. Chapman","doi":"10.1109/CLUSTER.2015.66","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.66","url":null,"abstract":"Languages and libraries based on the Partitioned Global Address Space (PGAS) programming model have emerged in recent years with a focus on addressing the programming challenges for scalable parallel systems. Among these, Coarray Fortran (CAF) is unique in that as it has been incorporated into an existing standard (Fortran 2008), and therefore it is of particular importance that implementations supporting it are both portable and deliver sufficient levels of performance. OpenSHMEM is a library which is the culmination of a standardization effort among many implementers and users of SHMEM, and it provides a means to develop light-weight, portable, scalable applications based on the PGAS programming model. As such, we propose here that OpenSHMEM is well situated to serve as a runtime substrate for CAF implementations. In this paper, we demonstrate how OpenSHMEM can be exploited as a runtime layer upon which CAF may be implemented. Specifically, we re-targeted the CAF implementation provided in the OpenUH compiler to OpenSHMEM, and show how parallel language features provided by CAF may be directly mapped to OpenSHMEM, including allocation of remotely accessible objects, one-sided communication, and various types of synchronization. Moreover, we present and evaluate various algorithms we developed for implementing remote access of non-contiguous array sections and acquisition and release of remote locks using the OpenSHMEM interface.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"120 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116709787","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Design and Development of Domain Specific Active Libraries with Proxy Applications 具有代理应用程序的特定领域活动库的设计与开发

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.128

I. Reguly, G. Mudalige, M. Giles

{"title":"Design and Development of Domain Specific Active Libraries with Proxy Applications","authors":"I. Reguly, G. Mudalige, M. Giles","doi":"10.1109/CLUSTER.2015.128","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.128","url":null,"abstract":"Representative applications are versatile tools to evaluate new programming approaches, techniques and optimisations as a way to ensure continued high performance on future computing architectures. They make experimentation much easier before adopting changes/insights into the large scientific codes. In this paper we demonstrate the important role played by representative/proxy applications in designing and developing two high-level programming approaches: namely the OP2 and OPS domain specific (active) libraries. OP2 and OPS utilizes code generation techniques to produce automatic parallelisations from a high-level abstract problem declaration. The strategy delivers significant developer productivity to the domain scientist, while at the same time allowing computational experts to adopt the latest programming models and hardware-specific optimisations into the library and code generation tools to achieve near optimal performance. We show how representative applications have been a cornerstone in the development of OP2 and OPS and chart our experiences. In particular, we demonstrate how the range of hand-tuned optimized parallelisations of the CloverLeaf hydrodynamics mini-app allowed us to gain clear evidence that the OPS based code generated parallelisations were indeed as optimal as the hand-tuned versions. Additionally, with the use of a representative application from the CFD domain we demonstrate how the optimisations discovered and applied to proxy apps are indeed directly transferable to a large-scale industrial application at Rolls Royce plc. These results provide significant evidence into the utility of representative applications to improve productivity, enable performance portability and ultimately future-proof scientific applications.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128304267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

DINO: Divergent Node Cloning for Sustained Redundancy in HPC 分布式节点克隆在高性能计算中的持续冗余

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.36

Arash Rezaei, F. Mueller, Paul H. Hargrove, Eric Roman

{"title":"DINO: Divergent Node Cloning for Sustained Redundancy in HPC","authors":"Arash Rezaei, F. Mueller, Paul H. Hargrove, Eric Roman","doi":"10.1109/CLUSTER.2015.36","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.36","url":null,"abstract":"Soft faults like silent data corruption and hard faults like hardware failures may cause a high performance computing (HPC) job of thousands of processes to nearly cease to make progress due to recovery overheads. Redundant computing has been proposed as a solution at extreme scale by allocating two or more processes to perform the same task. However, current redundant computing approaches do not repair failed replicas. Thus, SDC-free execution is not guaranteed after a replica failure and the job may finish with incorrect results. Replicas are logically equivalent, yet may have divergent runtime states during job execution, which complicates on-the-fly repairs for forward recovery. In this work, we present a redundant execution environment that quickly repairs hard failures via Divergent Node cloning (DINO) at the MPI task level. DINO contributes a novel task cloning service integrated into the MPI runtime system that solves the problem of consolidating divergent states among replicas on-the-fly. Experimental results indicate that DINO can recover from failures nearly instantaneously, thus retaining the redundancy level throughout job execution. The cloning overhead, depending on the process image size and its transfer rate, ranges from 5.60 to 90.48 seconds. To the best of our knowledge, the design and implementation for repairing failed replicas in redundant MPI computing is unprecedented.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128410210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

VecMeter: Measuring Vectorization on the Xeon Phi VecMeter:在Xeon Phi上测量矢量化

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.73

Joshua Peraza, Ananta Tiwari, W. A. Ward, R. Campbell, L. Carrington

引用次数: 1