2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)最新文献

Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs gpu上有限寄存器模板的面向图形的代码转换方法

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.13

Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang

{"title":"Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs","authors":"Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang","doi":"10.1109/CCGrid.2016.13","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.13","url":null,"abstract":"Stencil kernels play an important role in many scientific and engineering disciplines. With the development of numerical algorithms and the increasing requirements of accuracy, register-limited stencils containing massive variables and operations are widely used. However, these register-limited stencils consume vast resources when executing on GPUs. The excessive use of registers reduces the number of active threads dramatically, and consequently leads to a serious performance decline. To improve the performance of these register-limited stencils, we propose a DDG (data-dependency-graph) oriented code transformation approach in this paper. By analyzing, deleting and transforming the original stencil program on GPUs, our graph-oriented code transformation approach explores for the best trade-off between the calculation amount and the parallelism degree, and further achieves better performance. The graph-oriented code transformation approach is evaluated using the Weighted Nearly Analytic Discrete stencil, and the experimental result shows that a speedup of 2.16X can be achieved when compared with the original fairly-optimized implementation. To the best of our knowledge, our study takes the first step towards balancing the calculation amount and parallelism degree of the extremely register-limited stencils on GPUs.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access 面向远程内存访问的ARM服务器高效硬件节点间链路

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.66

Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu

{"title":"sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access","authors":"Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu","doi":"10.1109/CCGrid.2016.66","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.66","url":null,"abstract":"The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126112323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study 理解高性能计算中的工作异质性:NERSC案例研究

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.32

G. P. R. Álvarez, Per-Olov Östberg, E. Elmroth, K. Antypas, R. Gerber, L. Ramakrishnan

引用次数: 18

The Latin American Giant Observatory: A Successful Collaboration in Latin America Based on Cosmic Rays and Computer Science Domains 拉丁美洲巨型天文台:基于宇宙射线和计算机科学领域的拉丁美洲成功合作

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.110

Hernán Asorey, L. Núñez, M. Suárez-Durán, L. Torres-Niño, M. Pascual, A. J. Rubio-Montero, R. Mayo-García

{"title":"The Latin American Giant Observatory: A Successful Collaboration in Latin America Based on Cosmic Rays and Computer Science Domains","authors":"Hernán Asorey, L. Núñez, M. Suárez-Durán, L. Torres-Niño, M. Pascual, A. J. Rubio-Montero, R. Mayo-García","doi":"10.1109/CCGrid.2016.110","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.110","url":null,"abstract":"In this work the strategy of the Latin American Giant Observatory (LAGO) to build a Latin American collaboration is presented. Installing Cosmic Rays detectors settled all around the Continent, from Mexico to the Antarctica, this collaboration is forming a community that embraces both high energy physicist and computer scientists. This is so because the data that are measured must be analytical processed and due to the fact that a priori and a posteriori simulations representing the effects of the radiation must be performed. To perform the calculi, customized codes have been implemented by the collaboration. With regard to the huge amount of data emerging from this network of sensors and from the computational simulations performed in a diversity of computing architectures and e-infrastructures, an effort is being carried out to catalog and preserve a vast amount of data produced by the water-Cherenkov Detector network and the complete LAGO simulation workflow that characterize each site. Metadata, Permanent Identifiers and the facilities from the LAGO Data Repository are described in this work jointly with the simulation codes used. These initiatives allow researchers to produce and find data and to directly use them in a code running by means of a Science Gateway that provides access to different clusters, Grid and Cloud infrastructures worldwide.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129517002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Generalized GPU Acceleration for Applications Employing Finite-Volume Methods 应用有限体积方法的通用GPU加速

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.30

Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang

{"title":"Generalized GPU Acceleration for Applications Employing Finite-Volume Methods","authors":"Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang","doi":"10.1109/CCGrid.2016.30","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.30","url":null,"abstract":"Scientific HPC applications are increasingly ported to GPUs to benefit from both the high throughput and the powerful computing capacity. Many of these applications, such as atmospheric modeling and hydraulic erosion simulation, are adopting the finite volume method (FVM) as the solver algorithm. However, the communication components inside these applications generally lead to a low flop-to-byte ratio and an inefficient utilization of GPU resources. This paper aims at optimizing FVM solver based on the structured mesh. Besides a high-level overview of the finite-volume method as well as its basic optimizations on modern GPU platforms, we further present two generalized tuning techniques including an explicit cache mechanism as well as an inner-thread rescheduling method that tries to achieve a suitable mapping between the algorithm feature and the platform architecture. To the end, we demonstrate the impact of our generalized optimization methods in two typical atmospheric dynamic kernels (Euler and SWE) based on four mainstream GPU platforms. According to the experimental results of Tesla K80, speedups of 24.4x for SWE and 31.5x for Euler could be achieved over a 12-core Intel E5-2697 CPU, which is a great promotion compared with its original speedup (18x and 15.47x) without applying these two methods.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129653533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Exploiting Sample Diversity in Distributed Machine Learning Systems 利用分布式机器学习系统中的样本多样性

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.75

Zhiqiang Liu, Xuanhua Shi, Hai Jin

引用次数: 1

Service Level and Performance Aware Dynamic Resource Allocation in Overbooked Data Centers 基于服务水平和性能的超额数据中心动态资源分配

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.29

Luis Tomás, Ewnetu Bayuh Lakew, E. Elmroth

引用次数: 15

KVLight: A Lightweight Key-Value Store for Distributed Access in Cloud KVLight:用于云中分布式访问的轻量级键值存储

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.55

Jiaan Zeng, Beth Plale

{"title":"KVLight: A Lightweight Key-Value Store for Distributed Access in Cloud","authors":"Jiaan Zeng, Beth Plale","doi":"10.1109/CCGrid.2016.55","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.55","url":null,"abstract":"Key-value stores (KVS) are finding use in Big Data applications as the store offers a flexible data model, scalability in number of distributed nodes, and high availability. In a cloud environment, a distributed KVS is often deployed over the local file system of the nodes in a cluster of virtual machines (VMs). Parallel file system (PFS) offers an alternate approach to disk storage, however a distributed key value store running over a parallel file system can experience overheads due to its unawareness of the PFS. Additionally, distributed KVS requires persistent running services which is not cost effective under the pay-as-you-go model of cloud computing because resources have to be held even under periods of no workload. We propose KVLight, a lightweight KVS that runs over PFS. It is lightweight in the sense that it shifts the responsibility of reliable data storage to the PFS and focuses on performance. Specifically, KVLight is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not currently designed for. Furthermore, it allows on-demand access without running persistent services in front of the file system. Empirical results show that KVLight outperforms Cassandra and Voldemort, two state-of-the-art KVSs, under both synthetic and realistic workloads.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117328932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce 利用MapReduce探索星系结构模式发现的可扩展性

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.46

A. Vulpe, M. Frîncu

{"title":"Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce","authors":"A. Vulpe, M. Frîncu","doi":"10.1109/CCGrid.2016.46","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.46","url":null,"abstract":"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116399796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Cost-Efficient Elastic Stream Processing Using Application-Agnostic Performance Prediction 使用与应用程序无关的性能预测的经济高效的弹性流处理

2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid) Pub Date : 2016-05-16 DOI: 10.1109/CCGrid.2016.89

Shigeru Imai, S. Patterson, Carlos A. Varela

引用次数: 6