2015 IEEE International Conference on Cluster Computing最新文献

New Systems, New Behaviors, New Patterns: Monitoring Insights from System Standup 新系统、新行为、新模式:来自系统站立的监控洞察

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.116

J. Brandt, A. Gentile, C. Martin, Jason Repik, Narate Taerat

{"title":"New Systems, New Behaviors, New Patterns: Monitoring Insights from System Standup","authors":"J. Brandt, A. Gentile, C. Martin, Jason Repik, Narate Taerat","doi":"10.1109/CLUSTER.2015.116","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.116","url":null,"abstract":"Disentangling significant and important log messages from those that are routine and unimportant can be a difficult task. Further, on a new system, understanding correlations between significant and possibly new types of messages and conditions that cause them can require significant effort and time. The initial standup of a machine can provide opportunities for investigating the parameter space of events and operations and thus for gaining insight into the events of interest. In particular, failure inducement and investigation of corner case conditions can provide knowledge of system behavior for significant issues that will enable easier diagnosis and mitigation of such issues for when they may actually occur during the platform lifetime. In this work, we describe the testing process and monitoring results from a testbed system in preparation for the ACES Trinity system. We describe how events in the initial standup including changes in configuration and software and corner case testing has provided insights that can inform future monitoring and operating conditions, both of our test systems and the eventual large-scale Trinity system.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123434950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

An FPGA-Based Accelerator for Neighborhood-Based Collaborative Filtering Recommendation Algorithms 基于fpga的邻域协同过滤推荐算法加速器

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.79

Xiang Ma, Chao Wang, Qi Yu, Xi Li, Xuehai Zhou

引用次数: 10

Efficient Distributed Data Clustering on Spark 基于Spark的高效分布式数据聚类

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.84

Jia Li, Dongsheng Li, Yiming Zhang

引用次数: 6

RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Data 基因组数据的域特异性复制和并行处理

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.54

Mucahid Kutlu, G. Agrawal

{"title":"RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Data","authors":"Mucahid Kutlu, G. Agrawal","doi":"10.1109/CLUSTER.2015.54","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.54","url":null,"abstract":"As development of high-throughput and low-cost sequencing technologies is leading to massive volumes of genomic data, new solutions for handling data-intensive applications on parallel platforms are urgently required. Particularly, the nature of processing leads to both load balancing and I/O contention challenges. In this paper, we have developed a novel middleware system, RE-PAGE, which allows parallelization of applications that process genomic data with a simple, high-level API. To address load balancing and I/O contention, the features of the middleware include: 1) use of domain-specific information in the formation of data chunks (which can be of non-uniform sizes), 2) replication and placement of each chunk on a small number of nodes, performed in an intelligent way, and 3) scheduling schemes for achieving load balance, when data movement costs out-weigh processing costs and the chunks are of non-uniform sizes. We have evaluated our framework using three genomic applications, which are VarScan, Unified Genotyper, and Coverage Analyzer. We show that our approach leads to better performance than conventional MapReduce scheduling approaches and systems that access data from a centralized store. We also compare against popular frameworks, Hadoop and GATK, and show that our middleware outperforms both, achieving high parallel efficiency and scalability.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129038923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization 机器调谐机器:配置分布式流处理器与贝叶斯优化

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.13

Lorenz Fischer, Shen Gao, A. Bernstein

引用次数: 22

Multipath Load Balancing for M × N Communication Patterns on the Blue Gene/Q Supercomputer Interconnection Network 蓝基因/Q超级计算机互联网络中M × N通信模式的多径负载均衡

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.140

Huy Bui, R. Jacob, Preeti Malakar, V. Vishwanath, Andrew E. Johnson, M. Papka, J. Leigh

{"title":"Multipath Load Balancing for M × N Communication Patterns on the Blue Gene/Q Supercomputer Interconnection Network","authors":"Huy Bui, R. Jacob, Preeti Malakar, V. Vishwanath, Andrew E. Johnson, M. Papka, J. Leigh","doi":"10.1109/CLUSTER.2015.140","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.140","url":null,"abstract":"Achievable networking performance of applications in a supercomputer depends on the exact combination of the communication patterns of the applications and the routing algorithms used by the supercomputer. In order to achieve the highest networking performance for the applications the routing algorithms need to be designed optimally for those communication patterns. However, while communication patterns usually have a wide variation from application to application and even from phase to phase in an application, routing algorithms have a limited variation and usually are optimized for typical communication patterns. This results in high networking performance for favored communication patterns but low networking performance for others. In this paper we present approaches for improving networking performance by rebalancing load on physical links on the Blue Gene Q supercomputer. We realize our approaches in a framework called OPTIQ and demonstrate the efficacy of our framework via a set of benchmarks. Our results show that we can achieve 30% higher throughput on experiment with data and patterns from a real application. The improvement can be up to several times higher throughput than default MPI_Alltoallv used in the Blue Gene Q supercomputer for certain communication patterns.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129238758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Cache Management Scheme for Hiding Garbage Collection Latency in Flash-Based Solid State Drives 基于闪存的固态硬盘中隐藏垃圾收集延迟的缓存管理方案

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.75

Wei Xie, Yong Chen

引用次数: 3

Comparison of Vendor Supplied Environmental Data Collection Mechanisms 供应商提供的环境数据收集机制的比较

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.120

Sean Wallace, V. Vishwanath, S. Coghlan, Z. Lan, M. Papka

引用次数: 4

Practical Resource Monitoring for Robust High Throughput Computing 鲁棒高吞吐量计算的实用资源监控

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.115

G. Juve, Benjamín Tovar, Rafael Ferreira da Silva, Dariusz Król, D. Thain, E. Deelman, W. Allcock, M. Livny

{"title":"Practical Resource Monitoring for Robust High Throughput Computing","authors":"G. Juve, Benjamín Tovar, Rafael Ferreira da Silva, Dariusz Król, D. Thain, E. Deelman, W. Allcock, M. Livny","doi":"10.1109/CLUSTER.2015.115","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.115","url":null,"abstract":"Robust high throughput computing requires effective monitoring and enforcement of a variety of resources including CPU cores, memory, disk, and network traffic. Without effective monitoring and enforcement, it is easy to overload machines, causing failures and slowdowns, or underutilize machines, which results in wasted opportunities. This paper explores how to describe, measure, and enforce resources used by computational tasks. We focus on tasks running in distributed execution systems, in which a task requests the resources it needs, and the execution system ensures the availability of such resources. This presents two non-trivial problems: how to measure the resources consumed by a task, and how to monitor and report resource exhaustion in a robust and timely manner. For both of these tasks, operating systems have a variety of mechanisms with different degrees of availability, accuracy, overhead, and intrusiveness. We describe various forms of monitoring and the available mechanisms in contemporary operating systems. We then present two specific monitoring tools that choose different tradeoffs in overhead and accuracy, and evaluate them on a selection of benchmarks.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131076486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 29

Fault-Tolerant Protocol for Hybrid Task-Parallel Message-Passing Applications 混合任务并行消息传递应用的容错协议

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.104

Tatiana V. Martsinkevich, Omer Subasi, O. Unsal, F. Cappello, Jesús Labarta

引用次数: 14