2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)最新文献

筛选
英文 中文
Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs gpu上有限寄存器模板的面向图形的代码转换方法
Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang
{"title":"Graph-Oriented Code Transformation Approach for Register-Limited Stencils on GPUs","authors":"Mengyao Jin, H. Fu, Zihong Lv, Guangwen Yang","doi":"10.1109/CCGrid.2016.13","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.13","url":null,"abstract":"Stencil kernels play an important role in many scientific and engineering disciplines. With the development of numerical algorithms and the increasing requirements of accuracy, register-limited stencils containing massive variables and operations are widely used. However, these register-limited stencils consume vast resources when executing on GPUs. The excessive use of registers reduces the number of active threads dramatically, and consequently leads to a serious performance decline. To improve the performance of these register-limited stencils, we propose a DDG (data-dependency-graph) oriented code transformation approach in this paper. By analyzing, deleting and transforming the original stencil program on GPUs, our graph-oriented code transformation approach explores for the best trade-off between the calculation amount and the parallelism degree, and further achieves better performance. The graph-oriented code transformation approach is evaluated using the Weighted Nearly Analytic Discrete stencil, and the experimental result shows that a speedup of 2.16X can be achieved when compared with the original fairly-optimized implementation. To the best of our knowledge, our study takes the first step towards balancing the calculation amount and parallelism degree of the extremely register-limited stencils on GPUs.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127449263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access 面向远程内存访问的ARM服务器高效硬件节点间链路
Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu
{"title":"sAXI: A High-Efficient Hardware Inter-Node Link in ARM Server for Remote Memory Access","authors":"Ke Zhang, Yisong Chang, Lixin Zhang, Mingyu Chen, Lei Yu, Zhiwei Xu","doi":"10.1109/CCGrid.2016.66","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.66","url":null,"abstract":"The ever-growing need for fast big-data operations has made in-memory processing increasingly important in modern datacenters. To mitigate the capacity limitation of a single server node, techniques of inner-rack cross-node memory access have drawn attention recently. However, existing proposals exhibit inefficiency in remote memory access among server nodes due to inter-protocol conversions and non-transparent coarse-grained accesses. In this study, we propose the high-performance and efficient serialized AXI (sAXI) link and its associated cross-node memory access mechanism for emerging ARM-based servers. The key idea behind sAXI is directly extending the on-chip AMBA AXI-4.0 interconnection of the SoC in a local server node to the outside, and then bringing into remote server nodes via high-speed serial lanes. As a result, natively accessing remote memory in adjacent nodes in the same manner of local assets is supported by purely using existing software. Experimental results show that, using the sAXI data-path, performance of remote memory access in the user-level micro-benchmark is very promising (min. latency: 1.16μs, max. bandwidth: 1.52GB/s on our in-house FPGA prototype). In addition, through this efficient hardware inter-node link, performance of an in-memory key-value framework, Redis, can be improved up to 1.72x and large latency overhead of database query can be effectively hidden.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126112323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study 理解高性能计算中的工作异质性:NERSC案例研究
G. P. R. Álvarez, Per-Olov Östberg, E. Elmroth, K. Antypas, R. Gerber, L. Ramakrishnan
{"title":"Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study","authors":"G. P. R. Álvarez, Per-Olov Östberg, E. Elmroth, K. Antypas, R. Gerber, L. Ramakrishnan","doi":"10.1109/CCGrid.2016.32","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.32","url":null,"abstract":"The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125315583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 18
The Latin American Giant Observatory: A Successful Collaboration in Latin America Based on Cosmic Rays and Computer Science Domains 拉丁美洲巨型天文台:基于宇宙射线和计算机科学领域的拉丁美洲成功合作
Hernán Asorey, L. Núñez, M. Suárez-Durán, L. Torres-Niño, M. Pascual, A. J. Rubio-Montero, R. Mayo-García
{"title":"The Latin American Giant Observatory: A Successful Collaboration in Latin America Based on Cosmic Rays and Computer Science Domains","authors":"Hernán Asorey, L. Núñez, M. Suárez-Durán, L. Torres-Niño, M. Pascual, A. J. Rubio-Montero, R. Mayo-García","doi":"10.1109/CCGrid.2016.110","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.110","url":null,"abstract":"In this work the strategy of the Latin American Giant Observatory (LAGO) to build a Latin American collaboration is presented. Installing Cosmic Rays detectors settled all around the Continent, from Mexico to the Antarctica, this collaboration is forming a community that embraces both high energy physicist and computer scientists. This is so because the data that are measured must be analytical processed and due to the fact that a priori and a posteriori simulations representing the effects of the radiation must be performed. To perform the calculi, customized codes have been implemented by the collaboration. With regard to the huge amount of data emerging from this network of sensors and from the computational simulations performed in a diversity of computing architectures and e-infrastructures, an effort is being carried out to catalog and preserve a vast amount of data produced by the water-Cherenkov Detector network and the complete LAGO simulation workflow that characterize each site. Metadata, Permanent Identifiers and the facilities from the LAGO Data Repository are described in this work jointly with the simulation codes used. These initiatives allow researchers to produce and find data and to directly use them in a code running by means of a Science Gateway that provides access to different clusters, Grid and Cloud infrastructures worldwide.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129517002","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Generalized GPU Acceleration for Applications Employing Finite-Volume Methods 应用有限体积方法的通用GPU加速
Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang
{"title":"Generalized GPU Acceleration for Applications Employing Finite-Volume Methods","authors":"Jingheng Xu, H. Fu, L. Gan, Chao Yang, Wei Xue, Shizhen Xu, Wenlai Zhao, Xinliang Wang, Bingwei Chen, Guangwen Yang","doi":"10.1109/CCGrid.2016.30","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.30","url":null,"abstract":"Scientific HPC applications are increasingly ported to GPUs to benefit from both the high throughput and the powerful computing capacity. Many of these applications, such as atmospheric modeling and hydraulic erosion simulation, are adopting the finite volume method (FVM) as the solver algorithm. However, the communication components inside these applications generally lead to a low flop-to-byte ratio and an inefficient utilization of GPU resources. This paper aims at optimizing FVM solver based on the structured mesh. Besides a high-level overview of the finite-volume method as well as its basic optimizations on modern GPU platforms, we further present two generalized tuning techniques including an explicit cache mechanism as well as an inner-thread rescheduling method that tries to achieve a suitable mapping between the algorithm feature and the platform architecture. To the end, we demonstrate the impact of our generalized optimization methods in two typical atmospheric dynamic kernels (Euler and SWE) based on four mainstream GPU platforms. According to the experimental results of Tesla K80, speedups of 24.4x for SWE and 31.5x for Euler could be achieved over a 12-core Intel E5-2697 CPU, which is a great promotion compared with its original speedup (18x and 15.47x) without applying these two methods.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129653533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Exploiting Sample Diversity in Distributed Machine Learning Systems 利用分布式机器学习系统中的样本多样性
Zhiqiang Liu, Xuanhua Shi, Hai Jin
{"title":"Exploiting Sample Diversity in Distributed Machine Learning Systems","authors":"Zhiqiang Liu, Xuanhua Shi, Hai Jin","doi":"10.1109/CCGrid.2016.75","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.75","url":null,"abstract":"With the increase of machine learning scalability, there is a growing need for distributed systems which can execute machine learning algorithms on large clusters. Currently, most distributed machine learning systems are developed based on iterative optimization algorithm and parameter server framework. However, most systems compute on all samples in every iteration and this method consumes too much computing resources since the amount of samples is always too large. In this paper, we study on the sample diversity and find that most samples ontribute little to model updating during most iterations. Based on these findings, we propose a new iterative optimization algorithm to reduce the computation load by reusing the iterative computing results. The experiment demonstrates that, compared to the current methods, the algorithm proposed in this paper can reduce about 23% of the whole computation load without increasing of communications.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125756900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Service Level and Performance Aware Dynamic Resource Allocation in Overbooked Data Centers 基于服务水平和性能的超额数据中心动态资源分配
Luis Tomás, Ewnetu Bayuh Lakew, E. Elmroth
{"title":"Service Level and Performance Aware Dynamic Resource Allocation in Overbooked Data Centers","authors":"Luis Tomás, Ewnetu Bayuh Lakew, E. Elmroth","doi":"10.1109/CCGrid.2016.29","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.29","url":null,"abstract":"Many cloud computing providers use overbooking to increase their low utilization ratios. This however increases the risk of performance degradation due to interference among co-located VMs. To address this problem we present a service level and performance aware controller that: (1) provides performance isolation for high QoS VMs, and (2) reduces the VM interference between low QoS VMs by dynamically mapping virtual cores to physical cores, thus limiting the amount of resources that each VM can access depending on their performance. Our evaluation based on real cloud applications and both stress, synthetic and realistic workloads demonstrates that a more efficient use of the resources is achieved, dynamically allocating the available capacity to the applications that need it more, which in turn lead to a more stable and predictable performance over time.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"152 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115576025","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 15
KVLight: A Lightweight Key-Value Store for Distributed Access in Cloud KVLight:用于云中分布式访问的轻量级键值存储
Jiaan Zeng, Beth Plale
{"title":"KVLight: A Lightweight Key-Value Store for Distributed Access in Cloud","authors":"Jiaan Zeng, Beth Plale","doi":"10.1109/CCGrid.2016.55","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.55","url":null,"abstract":"Key-value stores (KVS) are finding use in Big Data applications as the store offers a flexible data model, scalability in number of distributed nodes, and high availability. In a cloud environment, a distributed KVS is often deployed over the local file system of the nodes in a cluster of virtual machines (VMs). Parallel file system (PFS) offers an alternate approach to disk storage, however a distributed key value store running over a parallel file system can experience overheads due to its unawareness of the PFS. Additionally, distributed KVS requires persistent running services which is not cost effective under the pay-as-you-go model of cloud computing because resources have to be held even under periods of no workload. We propose KVLight, a lightweight KVS that runs over PFS. It is lightweight in the sense that it shifts the responsibility of reliable data storage to the PFS and focuses on performance. Specifically, KVLight is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes, giving capability that embedded KVSs are not currently designed for. Furthermore, it allows on-demand access without running persistent services in front of the file system. Empirical results show that KVLight outperforms Cassandra and Voldemort, two state-of-the-art KVSs, under both synthetic and realistic workloads.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117328932","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce 利用MapReduce探索星系结构模式发现的可扩展性
A. Vulpe, M. Frîncu
{"title":"Exploring Scalability in Pattern Finding in Galactic Structure Using MapReduce","authors":"A. Vulpe, M. Frîncu","doi":"10.1109/CCGrid.2016.46","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.46","url":null,"abstract":"Astrophysical applications are known to be data and computationally intensive with large amounts of images being generated by telescopes on a daily basis. To analyze these images data mining, statistical, and image processing techniques are applied on the raw data. Big data platforms such as MapReduce are ideal candidates for processing and storing astrophysical data due to their ability to process loosely coupled parallel tasks. These platforms are usually deployed in clouds, however, most astrophysical applications are legacy applications that are not optimized for cloud computing. While some work towards exploiting the benefits of Hadoop to store astrophysical data and to process the large datasets exists, not much research has been done to assess the scalability of cloud enabled astrophysical applications. In this work we analyze the data and resource scalability of MapReduce applications for astrophysical problems related to cluster detection and inter cluster spatial pattern search. The maximum level of parallelism is bounded by the number of clusters and the number of (cluster, subcluster) pairs in the pattern search. We perform scale-up tests on Google Compute Engine and Amazon EC2. We show that while data scalability is achieved, resource scalability (scale up) is bounded and moreover seems to depend on the underlying cloud platform. For future work we also plan to investigate the scale out on tens of instances with large input files of several GB.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"224 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116399796","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Cost-Efficient Elastic Stream Processing Using Application-Agnostic Performance Prediction 使用与应用程序无关的性能预测的经济高效的弹性流处理
Shigeru Imai, S. Patterson, Carlos A. Varela
{"title":"Cost-Efficient Elastic Stream Processing Using Application-Agnostic Performance Prediction","authors":"Shigeru Imai, S. Patterson, Carlos A. Varela","doi":"10.1109/CCGrid.2016.89","DOIUrl":"https://doi.org/10.1109/CCGrid.2016.89","url":null,"abstract":"Cloud computing adds great on-demand scalability to stream processing systems with its pay-per-use cost model. However, to promise service level agreements to users while keeping resource allocation cost low is a challenging task due to uncertainties coming from various sources, such as the target application's scalability, future computational demand, and the target cloud infrastructure's performance variability. To deal with these uncertainties, it is essential to create accurate application performance prediction models. In cloud computing, the current state of the art in performance modelling remains application-specific. We propose an application-agnostic performance modeling that is applicable to a wide range of applications. We also propose an extension to probabilistic performance prediction. This paper reports the progress we have made so far.","PeriodicalId":103641,"journal":{"name":"2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid)","volume":"66 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117085112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信