{"title":"A LogP Extension for Modeling Tree Aggregation Networks","authors":"Taylor L. Groves, S. Gutierrez, D. Arnold","doi":"10.1109/CLUSTER.2015.117","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.117","url":null,"abstract":"As high-performance systems continue to expand in power and size, scalable communication and data transfer is necessary to facilitate next generation monitoring and analysis. Many popular frameworks such as MapReduce, MPI and MRNet utilize scalable reduction operations to fulfill the performance requirements of a large distributed system. The structures to handle these aggregations may simply consist of a single level with children reporting directly to the parent node, or it may be layered to create a large tree with varying breadth and height. Despite their common-place, the techniques for modeling these Tree Aggregation Networks (TANs) are lacking. This paper addresses this need by introducing a novel extension of the LogP framework for Tree Aggregation Networks. Our TAN model adheres to the simplicity of the LogP model, but utilizes structural insights to provide a simple yet precise performance estimate. Additionally, our model makes no assumptions of the underlying NIC transfer mechanisms or uniformity of tree breadth, making it suitable for a wide range of environments. To evaluate our TAN model, we compare it against the traditional LogP model for predicting the performance of the Multicast Reduction Network (MRNet) framework.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126670066","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luis Pineda-Morales, Alexandru Costan, Gabriel Antoniu
{"title":"Towards Multi-site Metadata Management for Geographically Distributed Cloud Workflows","authors":"Luis Pineda-Morales, Alexandru Costan, Gabriel Antoniu","doi":"10.1109/CLUSTER.2015.49","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.49","url":null,"abstract":"With their globally distributed datacenters, clouds now provide an opportunity to run complex large-scale applications on dynamically provisioned, networked and federated infrastructures. However, there is a lack of tools supporting data intensive applications across geographically distributed sites. For instance, scientific workflows which handle many small files can easily saturate state-of-the-art distributed filesystems based on centralized metadata servers (e.g. HDFS, PVFS). In this paper, we explore several alternative design strategies to efficiently support the execution of existing workflow engines across multi-site clouds, by reducing the cost of metadata operations. These strategies leverage workflow semantics in a 2-level metadata partitioning hierarchy that combines distribution and replication. The system was validated on the Microsoft Azure cloud across 4 EU and US datacenters. The experiments were conducted on 128 nodes using synthetic benchmarks and real-life applications. We observe as much as 28% gain in execution time for a parallel, geo-distributed real-world application (Montage) and up to 50% for a metadata-intensive synthetic benchmark, compared to a baseline centralized configuration.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126949625","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Evangelos Tasoulas, Ernst Gunnar Gran, Bjørn Dag Johnsen, Kyrre M. Begnum, T. Skeie
{"title":"Towards the InfiniBand SR-IOV vSwitch Architecture","authors":"Evangelos Tasoulas, Ernst Gunnar Gran, Bjørn Dag Johnsen, Kyrre M. Begnum, T. Skeie","doi":"10.1109/CLUSTER.2015.58","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.58","url":null,"abstract":"To meet the demands of the Exascale era and facilitate Big Data analytics in the cloud while maintaining flexibility, cloud providers will have to offer efficient virtualized High Performance Computing clusters in a pay-as-you-go model. As a consequence, high performance network interconnect solutions, like InfiniBand (IB), will be beneficial. Currently, the only way to provide IB connectivity on Virtual Machines (VMs) is by utilizing direct device assignment. At the same time to be scalable, Single-Root I/O Virtualization (SR-IOV) is used. However, the current SR-IOV model employed by IB adapters is a Shared Port implementation with limited flexibility, as it does not allow transparent virtualization and live-migration of VMs. In this paper, we explore an alternative SR-IOV model for IB, the virtual switch (vSwitch), and propose and analyze two vSwitch implementations with different scalability characteristics. Furthermore, as network reconfiguration time is critical to make live-migration a practical option, we accompany our proposed architecture with a scalable and topology agnostic dynamic reconfiguration method, implemented and tested using OpenSM. Our results show that we are able to significantly reduce the reconfiguration time as route recalculations are no longer needed, and in large IB subnets, for certain scenarios, the number of reconfiguration subnet management packets (SMPs) sent is reduced from several hundred thousand down to a single one.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127969583","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LU Factorization: Towards Hiding Communication Overheads with a Lookahead-Free Algorithm","authors":"T. Nguyen, S. Baden","doi":"10.1109/CLUSTER.2015.61","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.61","url":null,"abstract":"Lookahead is a well-known technique for masking communication in matrix factorization, but at the cost of complicating application software. We present a new approach, based on automated code-restructuring, that realizes the benefits of lookahead while avoiding the complications. We apply our technique to HPL, the Linpack benchmark used to assess the performance of supercomputers. Starting with the simpler non-lookahead version of the application, we are able to meet the performance of lookahead on the Stampede mainframe.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130620091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Two-Tiered Approach to I/O Quality of Service in Docker Containers","authors":"Sean McDaniel, Stephen Herbein, M. Taufer","doi":"10.1109/CLUSTER.2015.77","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.77","url":null,"abstract":"Linux containers allow applications to run in complete isolation from one another without the extra overhead of running entirely separate operating systems. This approach eliminates memory overheads associated with virtualization and virtual machines and helps businesses run their day-today applications. Unfortunately, multiple applications sharing the same resources can result in substantial resource contention among the applications in the containers and substantial performance loss. One way to mitigate this loss in performance is by ensuring quality of service (QoS) guaranteeing that the application of interest meets the performance requirements. Existing work targets ways of managing CPU, network, and memory contention, however, no solutions exist for managing contention associated with I/O. To address the I/O contention challenge in containers, we propose a two-tiered approach (i.e., at both the cluster and node levels) that extends Docker and Docker Swarm, making both capable of monitoring and controlling the I/O of Dockers containers. We demonstrate how our two-tiered approach has the potential for higher resource utilization without the effects of contention.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"2013 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128224447","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anthony Danalis, Heike Jagode, G. Bosilca, J. Dongarra
{"title":"PaRSEC in Practice: Optimizing a Legacy Chemistry Application through Distributed Task-Based Execution","authors":"Anthony Danalis, Heike Jagode, G. Bosilca, J. Dongarra","doi":"10.1109/CLUSTER.2015.50","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.50","url":null,"abstract":"Task-based execution has been growing in popularity as a means to deliver a good balance between performance and portability in the post-petascale era. The Parallel Runtime Scheduling and Execution Control (PARSEC) framework is a task-based runtime system that we designed to achieve high performance computing at scale. PARSEC offers a programming paradigm that is different than what has been traditionally used to develop large scale parallel scientific applications. In this paper, we discuss the use of PARSEC to convert a part of the Coupled Cluster (CC) component of the Quantum Chemistry package NWCHEM into a task-based form. We explain how we organized the computation of the CC methods in individual tasks with explicitly defined data dependencies between them and re-integrated the modified code into NWCHEM. We present a thorough performance evaluation and demonstrate that the modified code outperforms the original by more than a factor of two. We also compare the performance of different variants of the modified code and explain the different behaviors that lead to the differences in performance.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"882 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132933880","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
O. E. Messer, E. D'Azevedo, Judith C. Hill, W. Joubert, S. Laosooksathit, A. Tharrington
{"title":"Developing MiniApps on Modern Platforms Using Multiple Programming Models","authors":"O. E. Messer, E. D'Azevedo, Judith C. Hill, W. Joubert, S. Laosooksathit, A. Tharrington","doi":"10.1109/CLUSTER.2015.130","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.130","url":null,"abstract":"We have developed a set of reduced, proxy applications (\"MiniApps\") based on large-scale application codes supported at the Oak Ridge Leadership Computing Facility (OLCF). The MiniApps are designed to encapsulate the details of the most important (i.e. the most time-consuming and/or unique) facets of the applications that run in production mode on the OLCF. In each case, we have produced or plan to produce individual versions of the MiniApps using different specific programming models (e.g., OpenACC, CUDA, OpenMP). We describe some of our initial observations regarding these different implementations along with estimates of how closely the MiniApps track the actual performance characteristics (in particular, the overall scalability) of the large-scale applications from which they are derived.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"32 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133230357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pallas: An Application-Driven Task and Network Simulation Framework","authors":"Yuming Ye, Ziyang Li, Dongsheng Li, Yiming Zhang, Feng Liu, Yuxing Peng","doi":"10.1109/CLUSTER.2015.97","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.97","url":null,"abstract":"With the help of simulation tools, users can evaluate new proposals in cluster environment efficiently. However, current cloud simulators cannot meet the needs of application-driven simulation scenarios. In this paper, we propose Pallas, a task and network simulation framework that supports various cloud applications. Task-aware network scheduling and network-perceived task placement algorithms can be easily implemented in Pallas. We present the architecture and main components of Pallas and evaluate its effectiveness by comparing algorithm improvements to the actual results.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"68 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133927876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Nussbaum, Shwetha Mathangi Chandra Choodamani, K. Schwan
{"title":"ObsCon: Integrated Monitoring and Control for Parallel, Real-Time Applications","authors":"A. Nussbaum, Shwetha Mathangi Chandra Choodamani, K. Schwan","doi":"10.1109/CLUSTER.2015.72","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.72","url":null,"abstract":"A large class of emerging compute-intensive applications demand real-time or near real-time processing guarantees on streaming data. Sensor processing in particular, has stringent latency requirements for carrying out its digital processing for rapidly incoming radar data streams. The consequent demands on the cluster middleware used to run such codes include (i) efficient online observation of current application performance, coupled with (ii) highly responsive controllers able to dynamically adjust the application's input-and data-dependent runtime behavior. We present the Obs(erver)Con(troller) software for online monitoring and control, which based on specifications of acceptable application states and tunable knobs within the execution environment, ensures that application performance falls within acceptable limits. ObsCon topologies are dynamic, making possible the runtime association of ObsCon methods with arbitrary DAG-structured, distributed/parallel stream processing applications running on high end cluster machines. This paper describes the ObsCon software and its 'grey box' use with a high performance cluster code that exports to ObsCon select 'hooks' for online monitoring and control -- Adaptive Digital Beamforming for a phase-array radar system.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127803602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Yébenes, J. Escudero-Sahuquillo, P. García, F. Quiles
{"title":"Efficient Queuing Schemes for HoL-Blocking Reduction in Dragonfly Topologies with Minimal-Path Routing","authors":"P. Yébenes, J. Escudero-Sahuquillo, P. García, F. Quiles","doi":"10.1109/CLUSTER.2015.138","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.138","url":null,"abstract":"HPC systems are growing in number of connected endnodes, making the network a main issue in their design. In order to interconnect large systems, dragonfly topologies have become very popular in the latest years as they achieve high scalability by exploiting high-radix switches. However, dragonfly high performance may drop severely due to the Head-of-Line (HoL) blocking effect derived from congestion situations. Many techniques have been proposed for dealing with this harmful effect, the most effective ones being those especially designed for a specific topology and a specific routing algorithm. In this paper we present a queuing scheme called Hierarchical Two-Levels Queuing, designed specially to reduce HoL blocking in fully-connected dragonfly networks that use minimal-path routing. This proposal boosts network performance compared with other techniques and requires fewer network resources than the others. Besides, an upgrade for existing queuing schemes for improving their performance is explained.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127909375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}