J. Brandt, A. Gentile, C. Martin, Jason Repik, Narate Taerat
{"title":"New Systems, New Behaviors, New Patterns: Monitoring Insights from System Standup","authors":"J. Brandt, A. Gentile, C. Martin, Jason Repik, Narate Taerat","doi":"10.1109/CLUSTER.2015.116","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.116","url":null,"abstract":"Disentangling significant and important log messages from those that are routine and unimportant can be a difficult task. Further, on a new system, understanding correlations between significant and possibly new types of messages and conditions that cause them can require significant effort and time. The initial standup of a machine can provide opportunities for investigating the parameter space of events and operations and thus for gaining insight into the events of interest. In particular, failure inducement and investigation of corner case conditions can provide knowledge of system behavior for significant issues that will enable easier diagnosis and mitigation of such issues for when they may actually occur during the platform lifetime. In this work, we describe the testing process and monitoring results from a testbed system in preparation for the ACES Trinity system. We describe how events in the initial standup including changes in configuration and software and corner case testing has provided insights that can inform future monitoring and operating conditions, both of our test systems and the eventual large-scale Trinity system.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123434950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An FPGA-Based Accelerator for Neighborhood-Based Collaborative Filtering Recommendation Algorithms","authors":"Xiang Ma, Chao Wang, Qi Yu, Xi Li, Xuehai Zhou","doi":"10.1109/CLUSTER.2015.79","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.79","url":null,"abstract":"Neighborhood-based Collaborative Filtering (CF) is a kind of techniques in the field of recommendation algorithms and has been widely used in lots of personalized recommender systems. In the big data era, the increasing data amounts make these CF recommendation algorithms become time-consuming and energy-wasted. At present, Cloud computing and Graphic Processing Unit (GPU) are the two major platforms to accelerate CF algorithms. However, both platforms exist some remarkable shortcomings such as efficiency and power. To solve these problems, in our work, we investigate three neighborhood-based CF algorithms and design a general and flexible accelerator for them based on Field Programmable Gate Array (FPGA). This accelerator cooperates with host CPU and could accelerates primary time-consuming parts that these algorithms share. Experimental results show that our accelerator could significantly improve the acceleration efficiency with the affordable hardware cost and less energy consumption.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"103 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116242191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Distributed Data Clustering on Spark","authors":"Jia Li, Dongsheng Li, Yiming Zhang","doi":"10.1109/CLUSTER.2015.84","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.84","url":null,"abstract":"Data clustering is usually time-consuming since it by default needs to iteratively aggregate and process large volume of data. Approximate aggregation based on sample provides fast and quality ensured results. In this paper, we propose to leverage approximation techniques to data clustering to obtain the trade-off between clustering efficiency and result quality, along with online accuracy estimation. The proposed method is based on the bootstrap trials. We implemented this method as an Intelligent Bootstrap Library (IBL) on Spark to support efficient data clustering. Intensive evaluations show that IBL can provide a 2x speed-up over the state of art solution with the same error bound.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124970901","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RE-PAGE: Domain-Specific REplication and PArallel Processing of GEnomic Data","authors":"Mucahid Kutlu, G. Agrawal","doi":"10.1109/CLUSTER.2015.54","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.54","url":null,"abstract":"As development of high-throughput and low-cost sequencing technologies is leading to massive volumes of genomic data, new solutions for handling data-intensive applications on parallel platforms are urgently required. Particularly, the nature of processing leads to both load balancing and I/O contention challenges. In this paper, we have developed a novel middleware system, RE-PAGE, which allows parallelization of applications that process genomic data with a simple, high-level API. To address load balancing and I/O contention, the features of the middleware include: 1) use of domain-specific information in the formation of data chunks (which can be of non-uniform sizes), 2) replication and placement of each chunk on a small number of nodes, performed in an intelligent way, and 3) scheduling schemes for achieving load balance, when data movement costs out-weigh processing costs and the chunks are of non-uniform sizes. We have evaluated our framework using three genomic applications, which are VarScan, Unified Genotyper, and Coverage Analyzer. We show that our approach leads to better performance than conventional MapReduce scheduling approaches and systems that access data from a centralized store. We also compare against popular frameworks, Hadoop and GATK, and show that our middleware outperforms both, achieving high parallel efficiency and scalability.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"90 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129038923","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machines Tuning Machines: Configuring Distributed Stream Processors with Bayesian Optimization","authors":"Lorenz Fischer, Shen Gao, A. Bernstein","doi":"10.1109/CLUSTER.2015.13","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.13","url":null,"abstract":"Modern distributed computing frameworks such as Apache Hadoop, Spark, or Storm distribute the workload of applications across a large number of machines. Whilst they abstract the details of distribution they do require the programmer to set a number of configuration parameters before deployment. These parameter settings (usually) have a substantial impact on execution efficiency. Finding the right values for these parameters is considered a difficult task and requires domain, application, and framework expertise. In this paper, we propose a machine learning approach to the problem of configuring a distributed computing framework. Specifically, we propose using Bayesian Optimization to find good parameter settings. In an extensive empirical evaluation, we show that Bayesian Optimization can effectively find good parameter settings for four different stream processing topologies implemented in Apache Storm resulting in significant gains over a parallel linear approach.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129210525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huy Bui, R. Jacob, Preeti Malakar, V. Vishwanath, Andrew E. Johnson, M. Papka, J. Leigh
{"title":"Multipath Load Balancing for M × N Communication Patterns on the Blue Gene/Q Supercomputer Interconnection Network","authors":"Huy Bui, R. Jacob, Preeti Malakar, V. Vishwanath, Andrew E. Johnson, M. Papka, J. Leigh","doi":"10.1109/CLUSTER.2015.140","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.140","url":null,"abstract":"Achievable networking performance of applications in a supercomputer depends on the exact combination of the communication patterns of the applications and the routing algorithms used by the supercomputer. In order to achieve the highest networking performance for the applications the routing algorithms need to be designed optimally for those communication patterns. However, while communication patterns usually have a wide variation from application to application and even from phase to phase in an application, routing algorithms have a limited variation and usually are optimized for typical communication patterns. This results in high networking performance for favored communication patterns but low networking performance for others. In this paper we present approaches for improving networking performance by rebalancing load on physical links on the Blue Gene Q supercomputer. We realize our approaches in a framework called OPTIQ and demonstrate the efficacy of our framework via a set of benchmarks. Our results show that we can achieve 30% higher throughput on experiment with data and patterns from a real application. The improvement can be up to several times higher throughput than default MPI_Alltoallv used in the Blue Gene Q supercomputer for certain communication patterns.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129238758","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Cache Management Scheme for Hiding Garbage Collection Latency in Flash-Based Solid State Drives","authors":"Wei Xie, Yong Chen","doi":"10.1109/CLUSTER.2015.75","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.75","url":null,"abstract":"Recent advancements in flash-based solid state drive (SSD) make it a highly desirable storage device, especially for data-intensive applications. There are significant more SSDs used in data centers and high performance computing systems. SSDs perform one or two orders better than traditional hard disk drives generally. However, the performance of random writes on SSDs, especially small random writes, is still largely limited due to the garbage collection (GC) process. Existing work tried to utilize the on-device RAM as a write cache to improve the write performance, however directly utilizing it as a normal write cache under utilizes the RAM cache. In this poster, we present our initial study of a cache management scheme that hides the GC latency.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"338 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123185975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sean Wallace, V. Vishwanath, S. Coghlan, Z. Lan, M. Papka
{"title":"Comparison of Vendor Supplied Environmental Data Collection Mechanisms","authors":"Sean Wallace, V. Vishwanath, S. Coghlan, Z. Lan, M. Papka","doi":"10.1109/CLUSTER.2015.120","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.120","url":null,"abstract":"The high performance computing landscape is filled with diverse hardware components. A large part of understanding how these components compare to others is by looking at the various environmental aspects of these devices such as power consumption, temperature, etc. Thankfully, vendors of these various pieces of hardware have supported this by providing mechanisms to obtain this data. However, differences not only in the way this data is obtained but also the data which is provided is common between products. In this paper, we take a comprehensive look at the data which is available for the most common pieces of today's HPC landscape, as well as how this data is obtained and how accurate it is. Having surveyed these components, we compare and contrast them noting key differences as well as providing insight into what features future components should have.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114352387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
G. Juve, Benjamín Tovar, Rafael Ferreira da Silva, Dariusz Król, D. Thain, E. Deelman, W. Allcock, M. Livny
{"title":"Practical Resource Monitoring for Robust High Throughput Computing","authors":"G. Juve, Benjamín Tovar, Rafael Ferreira da Silva, Dariusz Król, D. Thain, E. Deelman, W. Allcock, M. Livny","doi":"10.1109/CLUSTER.2015.115","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.115","url":null,"abstract":"Robust high throughput computing requires effective monitoring and enforcement of a variety of resources including CPU cores, memory, disk, and network traffic. Without effective monitoring and enforcement, it is easy to overload machines, causing failures and slowdowns, or underutilize machines, which results in wasted opportunities. This paper explores how to describe, measure, and enforce resources used by computational tasks. We focus on tasks running in distributed execution systems, in which a task requests the resources it needs, and the execution system ensures the availability of such resources. This presents two non-trivial problems: how to measure the resources consumed by a task, and how to monitor and report resource exhaustion in a robust and timely manner. For both of these tasks, operating systems have a variety of mechanisms with different degrees of availability, accuracy, overhead, and intrusiveness. We describe various forms of monitoring and the available mechanisms in contemporary operating systems. We then present two specific monitoring tools that choose different tradeoffs in overhead and accuracy, and evaluate them on a selection of benchmarks.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131076486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tatiana V. Martsinkevich, Omer Subasi, O. Unsal, F. Cappello, Jesús Labarta
{"title":"Fault-Tolerant Protocol for Hybrid Task-Parallel Message-Passing Applications","authors":"Tatiana V. Martsinkevich, Omer Subasi, O. Unsal, F. Cappello, Jesús Labarta","doi":"10.1109/CLUSTER.2015.104","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.104","url":null,"abstract":"We present a fault-tolerant protocol for task-parallel message-passing applications to mitigate transient errors. The protocol requires the restart only of the task that experienced the error and transparently handles any MPI calls inside the task. The protocol is implemented in Nanos -- a dataflow runtime for task-based OmpSs programming model -- and the PMPI profiling layer to fully support hybrid OmpSs+MPI applications. In our experiments we demonstrate that our fault-tolerant solution has a reasonable overhead, with a maximum observed overhead of 4.5%. We also show that fine-grained parallelization is important for hiding the overheads related to the protocol as well as the recovery of tasks.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"418 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133579647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}