2015 IEEE International Conference on Cluster Computing最新文献_第8页

Improving Strong-Scaling on GPU Cluster Based on Tightly Coupled Accelerators Architecture 基于紧耦合加速器架构的GPU集群强伸缩改进

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.154

T. Hanawa, H. Fujii, N. Fujita, Tetsuya Odajima, Kazuya Matsumoto, Yuetsu Kodama, T. Boku

引用次数: 0

IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems IOSIG+: I/O跟踪与分析在Hadoop系统中的作用

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.17

Bo Feng, Xi Yang, Kun Feng, Yanlong Yin, Xian-He Sun

{"title":"IOSIG+: On the Role of I/O Tracing and Analysis for Hadoop Systems","authors":"Bo Feng, Xi Yang, Kun Feng, Yanlong Yin, Xian-He Sun","doi":"10.1109/CLUSTER.2015.17","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.17","url":null,"abstract":"Hadoop, as one of the most widely accepted MapReduce frameworks, is naturally data-intensive. Its several dependent projects, such as Mahout and Hive, inherent this characteristic. Meanwhile I/O optimization becomes a daunting work, since applications' source code is not always available. I/O traces for Hadoop and its dependents are increasingly important, because it can faithfully reveal intrinsic I/O behaviors without knowing the source code. This method can not only help to diagnose system bottlenecks but also further optimize performance. To achieve this goal, we propose a transparent tracing and analysis tool suite, namely IOSIG+, which can be plugged into Hadoop system. We make several contributions: 1) we describe our approach of tracing, 2) we release the tracer, which can trace I/O operations without modifying targets' source code, 3) this work adopts several techniques to mitigate the introduced execution overhead at runtime, 4) we create an analyzer, which helps to discover new approaches to address I/O problems according to access patterns. The experimental results and analysis confirm its effectiveness and the observed overhead can be as low as 1.97%.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123995047","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Evaluation of FFT for GPU Cluster Using Tightly Coupled Accelerators Architecture 基于紧耦合加速器架构的GPU集群FFT评估

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.113

T. Hanawa, H. Fujii, N. Fujita, Tetsuya Odajima, Kazuya Matsumoto, T. Boku

引用次数: 2

Exploring Memory Hierarchy to Improve Scientific Data Read Performance 探索内存层次结构以提高科学数据读取性能

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.18

Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steve Harenberg, Qing Liu, S. Klasky, N. Samatova

{"title":"Exploring Memory Hierarchy to Improve Scientific Data Read Performance","authors":"Wenzhao Zhang, Houjun Tang, Xiaocheng Zou, Steve Harenberg, Qing Liu, S. Klasky, N. Samatova","doi":"10.1109/CLUSTER.2015.18","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.18","url":null,"abstract":"Improving read performance is one of the major challenges with speeding up scientific data analytic applications. Utilizing the memory hierarchy is one major line of researches to address the read performance bottleneck. Related methods usually combine solide-state-drives(SSDs) with dynamic random-access memory(DRAM) and/or parallel file system(PFS) to mitigate the speed and space gap between DRAM and PFS. However, these methods are unable to handle key performance issues plaguing SSDs, namely read contention that may cause up to 50% performance reduction. In this paper, we propose a framework that exploits the memory hierarchy resource to address the read contention issues involved with SSDs. The framework employs a general purpose online read algorithm that able to detect and utilize memory hierarchy resource to relieve the problem. To maintain a near optimal operating environment for SSDs, the framework is able to orchastrate data chunks across different memory layers to facilitate the read algorithm. Compared to existing tools, our framework achieves up to 50% read performance improvement when tested on datasets from real-world scientific simulations.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114983967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Throughput Unfairness in Dragonfly Networks under Realistic Traffic Patterns 真实流量模式下蜻蜓网络的吞吐量不公平性

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.136

Pablo Fuentes, E. Vallejo, C. Camarero, R. Beivide, M. Valero

{"title":"Throughput Unfairness in Dragonfly Networks under Realistic Traffic Patterns","authors":"Pablo Fuentes, E. Vallejo, C. Camarero, R. Beivide, M. Valero","doi":"10.1109/CLUSTER.2015.136","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.136","url":null,"abstract":"Dragonfly networks have a two-level hierarchical arrangement of the network routers, and allow for a competitive cost-performance solution in large systems. Non-minimal adaptive routing is employed to fully exploit the path diversity and increase the performance under adversarial traffic patterns. Throughput unfairness prevents a balanced use of the resources across the network nodes and degrades severely the performance of any application running on an affected node. Previous works have demonstrated the presence of throughput unfairness in Dragonflies under certain adversarial traffic patterns, and proposed different alternatives to effectively combat such effect. In this paper we introduce a new traffic pattern denoted adversarial consecutive (ADVc), which portrays a real use case, and evaluate its impact on network performance and throughput fairness. This traffic pattern is the most adversarial in terms of network fairness. Our evaluations, both with or without transit-over-injection priority, show that global misrouting policies do not properly alleviate this problem. Therefore, explicit fairness mechanisms are required for these networks.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115042149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10

Optimizing Caching DSM for Distributed Software Speculation 为分布式软件猜测优化缓存 DSM

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.68

S. C. Koduru, Keval Vora, Rajiv Gupta

{"title":"Optimizing Caching DSM for Distributed Software Speculation","authors":"S. C. Koduru, Keval Vora, Rajiv Gupta","doi":"10.1109/CLUSTER.2015.68","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.68","url":null,"abstract":"Clusters with caching DSMs deliver programmability and performance by supporting shared-memory programming and tolerate remote I/O latencies via caching. The input to a data parallel program is partitioned across the cluster while the DSM transparently fetches and caches remote data as needed. Irregular applications, however, are challenging to parallelize because the input related data dependences that manifest at runtime require use of speculation for correct parallel execution. By speculating that there are no input related cross iteration dependences, private copies of the input can be processed by parallelizing the loop, the absence of dependences is validated before committing the computed results. We show that while caching helps tolerate long communication latencies in irregular data-parallel applications, using a cached values in a computation can lead to misspeculation and thus aggressive caching can degrade performance due to increased misspeculation rate. We present optimizations for distributed speculation on caching based DSMs that decrease the cost of misspeculation check and speed up the re-execution of misspeculated recomputations. Optimized distributed speculation achieves speedups of 2.24x for coloring, 1.71x for connected components, 1.88x for community detection, 1.32x for shortest path, and 1.74x for pagerank over unoptimized speculation.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127593373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Overcoming Hadoop Scaling Limitations through Distributed Task Execution 通过分布式任务执行克服Hadoop的扩展限制

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.42

Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou, Tonglin Li, M. Lang, Xian-He Sun, I. Raicu

{"title":"Overcoming Hadoop Scaling Limitations through Distributed Task Execution","authors":"Ke Wang, Ning Liu, Iman Sadooghi, Xi Yang, Xiaobing Zhou, Tonglin Li, M. Lang, Xian-He Sun, I. Raicu","doi":"10.1109/CLUSTER.2015.42","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.42","url":null,"abstract":"Data driven programming models like MapReduce have gained the popularity in large-scale data processing. Although great efforts through the Hadoop implementation and framework decoupling (e.g. YARN, Mesos) have allowed Hadoop to scale to tens of thousands of commodity cluster processors, the centralized designs of the resource manager, task scheduler and metadata management of HDFS file system adversely affect Hadoop's scalability to tomorrow's extreme-scale data centers. This paper aims to address the YARN scaling issues through a distributed task execution framework, MATRIX, which was originally designed to schedule the executions of data-intensive scientific applications of many-task computing on supercomputers. We propose to leverage the distributed design wisdoms of MATRIX to schedule arbitrary data processing applications in cloud. We compare MATRIX with YARN in processing typical Hadoop workloads, such as WordCount, TeraSort, Grep and RandomWriter, and the Ligand application in Bioinformatics on the Amazon Cloud. Experimental results show that MATRIX outperforms YARN by 1.27X for the typical workloads, and by 2.04X for the real application. We also run and simulate MATRIX with fine-grained sub-second workloads. With the simulation results giving the efficiency of 86.8% at 64K cores for the 150ms workload, we show that MATRIX has the potential to enable Hadoop to scale to extreme-scale data centers for fine-grained workloads.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"88 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116458814","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 68

Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing 面向分布式内存科学计算的语言间并行脚本

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1145/2822332.2822338

J. Wozniak, Timothy G. Armstrong, K. Maheshwari, D. Katz, M. Wilde, Ian T Foster

{"title":"Toward Interlanguage Parallel Scripting for Distributed-Memory Scientific Computing","authors":"J. Wozniak, Timothy G. Armstrong, K. Maheshwari, D. Katz, M. Wilde, Ian T Foster","doi":"10.1145/2822332.2822338","DOIUrl":"https://doi.org/10.1145/2822332.2822338","url":null,"abstract":"Scripting languages such as Python and R have been widely adopted as tools for the productive development of scientific software because of the power and expressiveness of the languages and available libraries. However, deploying scripted applications on large-scale parallel computer systems such as the IBM Blue Gene/Q or Cray XE6 is a challenge because of issues including operating system limitations, interoperability challenges, parallel filesystem overheads due to the small file system accesses common in scripted approaches, and other issues. We present here a new approach to these problems in which the Swift scripting system is used to integrate high-level scripts written in Python, R, and Tcl, with native code developed in C, C++, and Fortran, by linking Swift to the library interfaces to the script interpreters. In this approach, Swift handles data management, movement, and marshaling among distributed-memory processes without direct user manipulation of low-level communication libraries such as MPI. We present a technique to efficiently launch scripted applications on large-scale supercomputers using a hierarchical programming model.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123764306","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 13

Dynamic CPU Resource Allocation in Containerized Cloud Environments 容器化云环境下CPU资源的动态分配

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.99

J. M. Diaz, A. Landwehr, M. Taufer

引用次数: 32

Flexible Error Recovery Using Versions in Global View Resilience 灵活的错误恢复使用版本在全局视图弹性

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.88

N. Dun, H. Fujita, A. Fang, Yan Liu, A. Chien, P. Balaji, K. Iskra, Wesley Bland, A. Siegel

引用次数: 2