2015 IEEE International Conference on Cluster Computing最新文献_第5页

Parallel Modularity-Based Community Detection on Large-Scale Graphs 基于并行模块化的大规模图社区检测

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.11

Jianping Zeng, Hongfeng Yu

引用次数: 20

Balancing Thread-Level and Task-Level Parallelism for Data-Intensive Workloads on Clusters and Clouds 平衡集群和云上数据密集型工作负载的线程级和任务级并行性

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.60

Olivia Choudhury, D. Rajan, Nicholas L. Hazekamp, S. Gesing, D. Thain, S. Emrich

引用次数: 6

Pairwise Sequence Alignment with Gaps with GPU 用GPU对序列进行间隙对齐

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.109

T. Carroll, Jude-Thaddeus Ojiaku, Prudence W. H. Wong

引用次数: 4

Towards Building a Lightweight Key-Value Store on Parallel File System 在并行文件系统上构建轻量级键值存储

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.100

Jiaan Zeng, Beth Plale

{"title":"Towards Building a Lightweight Key-Value Store on Parallel File System","authors":"Jiaan Zeng, Beth Plale","doi":"10.1109/CLUSTER.2015.100","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.100","url":null,"abstract":"As data grows in number and size, big data applications begin to revolutionize the underlying storage system. On one hand, key-value store has prevailed as the back-end storage for big data applications owning to its schema-less data model, high scalability, and etc. On the other hand, parallel file system shared by multiple nodes offers large-capacity, high-throughput, as well as high-bandwidth access and is used widely in high performance computing (HPC) and cloud computing environments. In this paper, we explore the opportunity of building a lightweight key-value store that supports concurrent access over a parallel file system. The key-value store proposed relies on the sharing nature of parallel file system to provide distributed access. Instead of organizing a cluster of nodes with long running services to delegate the access, our key-value store simply embeds itself into applications and requires no long running services neither communication between nodes. Such a design not only simplifies the structure of a distributed key-value store but also avoids overhead introduced by having running services around the file system. We implemented a prototype of this system and compared it against Cassandra, a state-of-art key-value store. Preliminary results are promising.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114461900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Team-Based Methodology of Memory Hierarchy-Aware Runtime Support in Coarray Fortran Coarray Fortran中基于团队的内存层次感知运行时支持方法

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.67

Dounia Khaldi, Deepak Eachempati, Shiyao Ge, P. Jouvelot, B. Chapman

引用次数: 2

Network Quality of Service in Docker Containers Docker容器中的网络服务质量

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.96

Ayush Dusia, Yang Yang, M. Taufer

引用次数: 29

Fast and Accurate Support Vector Machines on Large Scale Systems 大规模系统快速准确的支持向量机

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.26

Abhinav Vishnu, Jeyanthi Narasimhan, L. Holder, D. Kerbyson, A. Hoisie

{"title":"Fast and Accurate Support Vector Machines on Large Scale Systems","authors":"Abhinav Vishnu, Jeyanthi Narasimhan, L. Holder, D. Kerbyson, A. Hoisie","doi":"10.1109/CLUSTER.2015.26","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.26","url":null,"abstract":"Support Vector Machines (SVM) is a supervised Machine Learning and Data Mining (MLDM) algorithm, which has become ubiquitous largely due to its high accuracy and obliviousness to dimensionality. The objective of SVM is to find an optimal boundary -- also known as hyperplane -- which separates the samples (examples in a dataset) of different classes by a maximum margin. Usually, very few samples contribute to the definition of the boundary. However, existing parallel algorithms use the entire dataset for finding the boundary, which is sub-optimal for performance reasons. In this paper, we propose a novel distributed memory algorithm to eliminate the samples which do not contribute to the boundary definition in SVM. We propose several heuristics, which range from early (aggressive) to late (conservative) elimination of the samples, such that the overall time for generating the boundary is reduced considerably. In a few cases, a sample may be eliminated (shrunk) pre-emptively -- potentially resulting in an incorrect boundary. We propose a scalable approach to synchronize the necessary data structures such that the proposed algorithm maintains its accuracy. We consider the necessary trade-offs of single/multiple synchronization using in-depth time-space complexity analysis. We implement the proposed algorithm using MPI and compare it with libsvm -- de facto sequential SVM software -- which we enhance with OpenMP for multi-core/many-core parallelism. Our proposed approach shows excellent efficiency using up to 4096 processes on several large datasets such as UCI HIGGS Boson dataset and Offending URL dataset.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127457906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Extending LDMS to Enable Performance Monitoring in Multi-core Applications 扩展LDMS以支持多核应用程序中的性能监控

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.125

S. Feldman, Deli Zhang, D. Dechev, J. Brandt

{"title":"Extending LDMS to Enable Performance Monitoring in Multi-core Applications","authors":"S. Feldman, Deli Zhang, D. Dechev, J. Brandt","doi":"10.1109/CLUSTER.2015.125","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.125","url":null,"abstract":"Identifying design patterns that limit the performance of multi-core algorithms is a challenging task. There are many known methods by which threads synchronize their actions and each method may exhibit different behavior in different use cases. These use cases may vary in regards to the workload being executed, number of parallel tasks, dependencies between these tasks, and the behavior of the system scheduler. Restructuring algorithms to overcome performance limitations requires intimate knowledge on how these algorithms utilize the hardware. In our experience, we have found a lack of adequate tools to gain such knowledge. To address this, we have enhanced and implemented additional data sampler modules for OVIS's Lightweight Distributed Metric Service (LDMS) to enable scalable distributed collection of hardware performance counter data. These modules provide an interface by which LDMS can utilize the PAPI library, Linux perf tools, and RAPL to collect hardware performance data of interest. Using these samplers, we plan to monitor the intra-node behavior, including contention for node level shared resources, of multi-core applications for a diverse set of use cases. We are currently exploring how the values reported are affected by the level of concurrency, the synchronization methodologies, and progress guarantees. We hope to use this information to identify ways to restructure algorithms to increase their performance.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127060948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

The Performance Implication of Task Size for Applications on the HPX Runtime System HPX运行时系统中应用程序的任务大小对性能的影响

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.119

Patricia A. Grubel, Hartmut Kaiser, Jeanine E. Cook, Adrian Serio

{"title":"The Performance Implication of Task Size for Applications on the HPX Runtime System","authors":"Patricia A. Grubel, Hartmut Kaiser, Jeanine E. Cook, Adrian Serio","doi":"10.1109/CLUSTER.2015.119","DOIUrl":"https://doi.org/10.1109/CLUSTER.2015.119","url":null,"abstract":"As High Performance Computing moves toward Exascale, where parallel applications will be expected to run on millions of cores concurrently, every component of the computational model must perform optimally. One such component, the task scheduler, can potentially be optimized to runtime application requirements. We focus our study using a task-based runtime system, one possible solution towards Exascale computation. Based on task size and scheduler, the overheads associated with task scheduling vary. Therefore, to minimize overheads and optimize performance, either the task size or the scheduler must adapt. In this paper, we focus on adapting the task size, which can be easily done statically and potentially done dynamically. To this end, we first show how scheduling overheads change with task size or granularity. We then propose and execute a methodology to characterize these overheads and dynamically measure the effects of task granularity. The HPX runtime system [1] employs asynchronous fine-grained task scheduling and incorporates a dynamic performance modeling capability, providing an ideal experimental platform. Using the performance counter capabilities in HPX, we characterize task scheduling overheads and show metrics to determine optimal task size. This is the first step toward the goal of dynamically adapting task size to optimize parallel performance.","PeriodicalId":187042,"journal":{"name":"2015 IEEE International Conference on Cluster Computing","volume":"124 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124939948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Monitoring High Performance Computing Systems for the End User 为最终用户监控高性能计算系统

2015 IEEE International Conference on Cluster Computing Pub Date : 2015-09-08 DOI: 10.1109/CLUSTER.2015.124

C. Moore, P. Khalsa, Todd Alan Yilk, M. Mason

引用次数: 4