2015 IEEE International Conference on Cluster Computing最新文献

筛选
英文 中文
Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture 基于紧耦合加速器架构的GPU集群强伸缩改进
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.154
T. Hanawa, H. Fujii, N. Fujita, Tetsuya Odajima, Kazuya Matsumoto, Yuetsu Kodama, T. Boku
{"title":"Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture","authors":"T. Hanawa, H. Fujii, N. Fujita, Tetsuya Odajima, Kazuya Matsumoto, Yuetsu Kodama, T. Boku","doi":"10.1109/CLUSTER.2015.154","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.154","url":null,"abstract":"The Tightly Coupled Accelerators (TCA) architecture that we proposed in previous work enables direct communication between accelerators over nodes. In this paper, we present a proof-of-concept GPU cluster called the HA-PACS/TCA using the PEACH2 chip that we designed as an interconnection router chip based on the TCA architecture. Our system demonstrated 2.0 ?sec of latency on inter-node GPU-to-GPU communication with a PCIe Gen2 x8 by RDMA, reducing minimum latency to just 44% of the InfiniBand-QDR and MPI using GPUDirect for RDMA. Through results of Himeno benchmark tests, we demonstrated that our TCA architecture improved performance scalability with the small-sized problem by up to 61%.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"17 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114184385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems IOSIG+: I/O跟踪与分析在Hadoop系统中的作用
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.17
Bo Feng, Xi Yang, Kun Feng, Yanlong Yin, Xian-He Sun
{"title":"IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems","authors":"Bo Feng, Xi Yang, Kun Feng, Yanlong Yin, Xian-He Sun","doi":"10.1109/CLUSTER.2015.17","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.17","url":null,"abstract":"Hadoop, as one of the most widely accepted MapReduce frameworks, is naturally data-intensive. Its several dependent projects, such as Mahout and Hive, inherent this characteristic. Meanwhile I/O optimization becomes a daunting work, since applications' source code is not always available. I/O traces for Hadoop and its dependents are increasingly important, because it can faithfully reveal intrinsic I/O behaviors without knowing the source code. This method can not only help to diagnose system bottlenecks but also further optimize performance. To achieve this goal, we propose a transparent tracing and analysis tool suite, namely IOSIG+, which can be plugged into Hadoop system. We make several contributions: 1) we describe our approach of tracing, 2) we release the tracer, which can trace I/O operations without modifying targets' source code, 3) this work adopts several techniques to mitigate the introduced execution overhead at runtime, 4) we create an analyzer, which helps to discover new approaches to address I/O problems according to access patterns. The experimental results and analysis confirm its effectiveness and the observed overhead can be as low as 1.97%.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123995047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Evaluation of FFT for GPU Cluster Using Tightly Coupled Accelerators Architecture 基于紧耦合加速器架构的GPU集群FFT评估
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.113
T. Hanawa, H. Fujii, N. Fujita, Tetsuya Odajima, Kazuya Matsumoto, T. Boku
{"title":"Evaluation of FFT for GPU Cluster Using Tightly Coupled Accelerators Architecture","authors":"T. Hanawa, H. Fujii, N. Fujita, Tetsuya Odajima, Kazuya Matsumoto, T. Boku","doi":"10.1109/CLUSTER.2015.113","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.113","url":null,"abstract":"Inter-node communications between accelerators in heterogeneous clusters require extra latency because of the time required to transfer data copies between the host and accelerator. Such communication latencies inhibit the optimal performance of affected applications. To address this problem, we proposed the Tightly Coupled Accelerators (TCA) architecture and designed an interconnection router chip named PEACH2. Accelerators in the TCA architecture communicate directly via the PCIe protocol, which is the current fundamental interface for all the accelerators and the host CPU, to eliminate protocol and data copy overheads. In this paper, we apply the TCA architecture to the Fast Fourier Transform (FFT) program, which is commonly used in scientific computations. First, we implemented all-to-all communication to TCA. The all-to-all communication was then applied to FFTE, which is one of the implementations of FFT. Based on the evaluation results using the HA-PACS/TCA system, we achieved the speedup of 2.7 with TCA in comparison with that with MPI using 16 nodes on the medium size.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117275153","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Exploring Memory Hierarchy to Improve Scientific Data Read Performance 探索内存层次结构以提高科学数据读取性能
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.18
Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steve Harenberg, Qing Liu, S. Klasky, N. Samatova
{"title":"Exploring Memory Hierarchy to Improve Scientific Data Read Performance","authors":"Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steve Harenberg, Qing Liu, S. Klasky, N. Samatova","doi":"10.1109/CLUSTER.2015.18","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.18","url":null,"abstract":"Improving read performance is one of the major challenges with speeding up scientific data analytic applications. Utilizing the memory hierarchy is one major line of researches to address the read performance bottleneck. Related methods usually combine solide-state-drives(SSDs) with dynamic random-access memory(DRAM) and/or parallel file system(PFS) to mitigate the speed and space gap between DRAM and PFS. However, these methods are unable to handle key performance issues plaguing SSDs, namely read contention that may cause up to 50% performance reduction. In this paper, we propose a framework that exploits the memory hierarchy resource to address the read contention issues involved with SSDs. The framework employs a general purpose online read algorithm that able to detect and utilize memory hierarchy resource to relieve the problem. To maintain a near optimal operating environment for SSDs, the framework is able to orchastrate data chunks across different memory layers to facilitate the read algorithm. Compared to existing tools, our framework achieves up to 50% read performance improvement when tested on datasets from real-world scientific simulations.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114983967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Throughput Unfairness in Dragonfly Networks under Realistic Traffic Patterns 真实流量模式下蜻蜓网络的吞吐量不公平性
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.136
Pablo Fuentes, E. Vallejo, C. Camarero, R. Beivide, M. Valero
{"title":"Throughput Unfairness in Dragonfly Networks under Realistic Traffic Patterns","authors":"Pablo Fuentes, E. Vallejo, C. Camarero, R. Beivide, M. Valero","doi":"10.1109/CLUSTER.2015.136","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.136","url":null,"abstract":"Dragonfly networks have a two-level hierarchical arrangement of the network routers, and allow for a competitive cost-performance solution in large systems. Non-minimal adaptive routing is employed to fully exploit the path diversity and increase the performance under adversarial traffic patterns. Throughput unfairness prevents a balanced use of the resources across the network nodes and degrades severely the performance of any application running on an affected node. Previous works have demonstrated the presence of throughput unfairness in Dragonflies under certain adversarial traffic patterns, and proposed different alternatives to effectively combat such effect. In this paper we introduce a new traffic pattern denoted adversarial consecutive (ADVc), which portrays a real use case, and evaluate its impact on network performance and throughput fairness. This traffic pattern is the most adversarial in terms of network fairness. Our evaluations, both with or without transit-over-injection priority, show that global misrouting policies do not properly alleviate this problem. Therefore, explicit fairness mechanisms are required for these networks.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115042149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Optimizing Caching DSM for Distributed Software Speculation 为分布式软件猜测优化缓存 DSM
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.68
S. C. Koduru, Keval Vora, Rajiv Gupta
{"title":"Optimizing Caching DSM for Distributed Software Speculation","authors":"S. C. Koduru, Keval Vora, Rajiv Gupta","doi":"10.1109/CLUSTER.2015.68","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.68","url":null,"abstract":"Clusters with caching DSMs deliver programmability and performance by supporting shared-memory programming and tolerate remote I/O latencies via caching. The input to a data parallel program is partitioned across the cluster while the DSM transparently fetches and caches remote data as needed. Irregular applications, however, are challenging to parallelize because the input related data dependences that manifest at runtime require use of speculation for correct parallel execution. By speculating that there are no input related cross iteration dependences, private copies of the input can be processed by parallelizing the loop, the absence of dependences is validated before committing the computed results. We show that while caching helps tolerate long communication latencies in irregular data-parallel applications, using a cached values in a computation can lead to misspeculation and thus aggressive caching can degrade performance due to increased misspeculation rate. We present optimizations for distributed speculation on caching based DSMs that decrease the cost of misspeculation check and speed up the re-execution of misspeculated recomputations. Optimized distributed speculation achieves speedups of 2.24x for coloring, 1.71x for connected components, 1.88x for community detection, 1.32x for shortest path, and 1.74x for pagerank over unoptimized speculation.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127593373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Overcoming Hadoop Scaling Limitations through Distributed Task Execution 通过分布式任务执行克服Hadoop的扩展限制
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.42
Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou, Tonglin Li, M. Lang, Xian-He Sun, I. Raicu
{"title":"Overcoming Hadoop Scaling Limitations through Distributed Task Execution","authors":"Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou, Tonglin Li, M. Lang, Xian-He Sun, I. Raicu","doi":"10.1109/CLUSTER.2015.42","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.42","url":null,"abstract":"Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoop's scalability to tomorrow's extreme-scale data centers. This paper aims to address the YARN scaling issues through a distributed task execution framework, MATRIX, which was originally designed to schedule the executions of data-intensive scientific applications of many-task computing on supercomputers. We propose to leverage the distributed design wisdoms of MATRIX to schedule arbitrary data processing applications in cloud. We compare MATRIX with YARN in processing typical Hadoop workloads, such as WordCount, TeraSort, Grep and RandomWriter, and the Ligand application in Bioinformatics on the Amazon Cloud. Experimental results show that MATRIX outperforms YARN by 1.27X for the typical workloads, and by 2.04X for the real application. We also run and simulate MATRIX with fine-grained sub-second workloads. With the simulation results giving the efficiency of 86.8% at 64K cores for the 150ms workload, we show that MATRIX has the potential to enable Hadoop to scale to extreme-scale data centers for fine-grained workloads.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116458814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 68
Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing 面向分布式内存科学计算的语言间并行脚本
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1145/2822332.2822338
J. Wozniak, Timothy G. Armstrong, K. Maheshwari, D. Katz, M. Wilde, Ian T Foster
{"title":"Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing","authors":"J. Wozniak, Timothy G. Armstrong, K. Maheshwari, D. Katz, M. Wilde, Ian T Foster","doi":"10.1145/2822332.2822338","DOIUrl":"https://doi.org/10.1145/2822332.2822338","url":null,"abstract":"Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123764306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 13
Dynamic CPU Resource Allocation in Containerized Cloud Environments 容器化云环境下CPU资源的动态分配
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.99
J. M. Diaz, A. Landwehr, M. Taufer
{"title":"Dynamic CPU Resource Allocation in Containerized Cloud Environments","authors":"J. M. Diaz, A. Landwehr, M. Taufer","doi":"10.1109/CLUSTER.2015.99","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.99","url":null,"abstract":"In recent years, lighter-weight virtualization solutions have begun to emerge as an alternative to virtual machines. Because these solutions are still in their infancy, however, several research questions remain open in terms of how to effectively manage computing resources. One important problem is the management of resources in the event of overutilization. For some applications, overutilization can severely affect performance. We provide a solution to this problem by extending the concept of timeslicing to the level of virtualization container. Through this approach we can control and mitigate some of the more detrimental performance effects oversubscription. Our results show significant improvement over standard scheduling with Docker.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128413413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 32
Flexible Error Recovery Using Versions in Global View Resilience 灵活的错误恢复使用版本在全局视图弹性
2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.88
N. Dun, H. Fujita, A. Fang, Yan Liu, A. Chien, P. Balaji, K. Iskra, Wesley Bland, A. Siegel
{"title":"Flexible Error Recovery Using Versions in Global View Resilience","authors":"N. Dun, H. Fujita, A. Fang, Yan Liu, A. Chien, P. Balaji, K. Iskra, Wesley Bland, A. Siegel","doi":"10.1109/CLUSTER.2015.88","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.88","url":null,"abstract":"We present the Global View Resilience (GVR) system, a library that enables applications to add resilience in a portable, application-controlled fashion using versioned distributed arrays. We briefly describe GVR's interfaces for distributed arrays, versioning, and cross-layer error recovery. We illustrate how GVR can be used for rollback recovery and a wide range additional error recovery techniques including forward recovery for latent errors or silent data corruptions. Application results demonstrate that GVR's interfaces and implementation are portable, flexible (support a variety of recovery models), efficient and create a gentle-slope path to tolerate growing error rates in future systems.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"229 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133725599","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信