IEEE International Symposium on High-Performance Parallel Distributed Computing最新文献

Communication-driven scheduling for virtual clusters in cloud 云中虚拟集群的通信驱动调度

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600714

Haibao Chen, Song Wu, S. Di, B. Zhou, Zhenjiang Xie, Hai Jin, Xuanhua Shi

{"title":"Communication-driven scheduling for virtual clusters in cloud","authors":"Haibao Chen, Song Wu, S. Di, B. Zhou, Zhenjiang Xie, Hai Jin, Xuanhua Shi","doi":"10.1145/2600212.2600714","DOIUrl":"https://doi.org/10.1145/2600212.2600714","url":null,"abstract":"Due to high flexibility and cost-effectiveness, cloud computing is increasingly being explored as an alternative to local clusters by academic and commercial users. Recent research already confirmed the feasibility of running tightly-coupled parallel applications with virtual clusters. However, such types of applications suffer from significant performance degradation, especially as the overcommitment is common in cloud. That is, the number of executable Virtual CPUs (VCPUs) is often larger than that of available Physical CPUs (PCPUs) in the system. The performance degradation mainly results from that the current Virtual Machine Monitors (VMMs) cannot co-schedule (or coordinate at the same time) the VCPUs that host parallel application threads/processes with synchronization requirements.\u0000 We introduce a communication-driven scheduling approach for virtual clusters in this paper, which can effectively mitigate the performance degradation of tightly-coupled parallel applications running atop them in overcommitted situation. There are two key contributions. 1) We propose a communication-driven VM scheduling (CVS) algorithm, by which the involved VMM schedulers can autonomously schedule suitable VMs at runtime. 2) We integrate the CVS algorithm into Xen VMM scheduler, and rigorously implement a prototype. We evaluate our design on a real cluster environment, and experiments show that our solution attains better performance for tightly-coupled parallel applications than the state-of-the-art approaches like Credit scheduler of Xen, balance scheduling, and hybrid scheduling.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115291439","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

When paxos meets erasure code: reduce network and storage cost in state machine replication 当paxos遇到erasure code时:降低状态机复制的网络和存储成本

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600218

Shuai Mu, Kang Chen, Yongwei Wu, Weimin Zheng

{"title":"When paxos meets erasure code: reduce network and storage cost in state machine replication","authors":"Shuai Mu, Kang Chen, Yongwei Wu, Weimin Zheng","doi":"10.1145/2600212.2600218","DOIUrl":"https://doi.org/10.1145/2600212.2600218","url":null,"abstract":"Paxos-based state machine replication is a key technique to build highly reliable and available distributed services, such as lock servers, databases and other data storage systems. Paxos can tolerate any minority number of node crashes in an asynchronous network environment. Traditionally, Paxos is used to perform a full copy replication across all participants. However, full copy is expensive both in term of network and storage cost, especially in wide area with commodity hard drives.\u0000 In this paper, we discussed the non-triviality and feasibility of combining erasure code into Paxos protocol, and presented an improved protocol named RS-Paxos (Reed Solomon Paxos). To the best of our knowledge, we are the first to propose such a combination. Compared to Paxos, RS-Paxos requires a limitation on the number of possible failures. If the number of tolerated failures decreases by 1, RS-Paxos can save over 50% of network transmission and disk I/O. To demonstrate the benefits of our protocol, we designed and built a key-value store based on RS-Paxos, and evaluated it on EC2 with various settings. Experiment results show that RS-Paxos achieves at most 2.5x improvement on write throughput and as much as 30% reduction on latency, in common configurations.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116043830","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 20

CBL: exploiting community based locality for efficient content search in online social networks CBL:利用基于社区的局部性在在线社交网络中进行高效的内容搜索

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600707

Hanhua Chen, Fan Zhang, Hai Jin

{"title":"CBL: exploiting community based locality for efficient content search in online social networks","authors":"Hanhua Chen, Fan Zhang, Hai Jin","doi":"10.1145/2600212.2600707","DOIUrl":"https://doi.org/10.1145/2600212.2600707","url":null,"abstract":"Retrieving relevant data for users in online social network (OSN) systems is a challenging problem. Cassandra, a storage system used by popular OSN systems, such as Facebook and Twitter, relies on a DHT-based scheme to randomly partition the personal data of users among servers across multiple data centers. Although DHT is highly scalable for hosting a large number of users (personal data), it leads to costly inter-server communications across data centers due to the complex interconnection and interaction among OSN users. In this paper, we explore how to retrieve the OSN content in a cost-effective way by retaining the simple and robust nature of OSNs. Our approach exploits a simple, yet powerful principle called Community-Based Locality (CBL), which posits that if a user has an one-hop neighbor within a particular community, it is very likely that the user has other one-hop neighbors inside the same community. We demonstrate the existence of community-based locality in diverse traces of popular OSN systems such as Facebook, Orkut, Flickr, Youtube, and Livejournal.\u0000 Based on the observation, we design a CBL-based algorithm to build the content index in OSN systems. By partitioning and indexing the relevant data of users within a community on the same server in the data center, the CBL-based index avoids a significant amount of inter-server communications during searching, making retrieving relevant data for a user in large-scale OSNs efficient. In addition, by using CBL-based scheme we can provide much shorter query latency and balanced loads. We conduct comprehensive trace-driven simulations to evaluate the performance of the proposed scheme. Results show that our scheme significantly reduces the network traffic by 73% compared with existing schemes.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123936502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Domino: an incremental computing framework in cloud with eventual synchronization Domino:具有最终同步的云中的增量计算框架

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600705

Dong Dai, Yong Chen, D. Kimpe, R. Ross, Xuehai Zhou

引用次数: 5

Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks 蜻蜓网络中最近邻交换的有效任务分配和路由

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600225

B. Prisacari, G. Rodríguez, P. Heidelberger, Dong Chen, C. Minkenberg, T. Hoefler

{"title":"Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks","authors":"B. Prisacari, G. Rodríguez, P. Heidelberger, Dong Chen, C. Minkenberg, T. Hoefler","doi":"10.1145/2600212.2600225","DOIUrl":"https://doi.org/10.1145/2600212.2600225","url":null,"abstract":"Dragonflies are recent network designs that are one of the most promising topologies for the Exascale effort due to their scalability and cost. While being able to achieve very high throughput under random uniform all-to-all traffic, this type of network can experience significant performance degradation for other common high performance computing workloads such as stencil (multi-dimensional nearest neighbor) patterns. Often, the lack of peak performance is caused by an insufficient understanding of the interaction between the workload and the network, and an insufficient understanding of how application specific task-to-node mapping strategies can serve as optimization vehicles.\u0000 To address these issues, we propose a theoretical performance analysis framework that takes as inputs a network specification and a traffic demand matrix characterizing an arbitrary workload and is able to predict where bottlenecks will occur in the network and what their impact will be on the effective sustainable injection bandwidth. We then focus our analysis on a specific high-interest communication pattern, the multi-dimensional Cartesian nearest neighbor exchange, and provide analytic bounds (owing to bottlenecks in the remote links of the Dragonfly) on its expected performance across a multitude of possible mapping strategies.\u0000 Finally, using a comprehensive set of simulations results, we validate the correctness of the theoretical approach and in the process address some misconceptions regarding Dragonfly network behavior and evaluation, (such as the choice of throughput maximization over workload completion time minimization as optimization objective) and the question of whether the standard notion of Dragonfly balance can be extended to workloads other than uniform random traffic.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128152482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 44

Computation and communication efficient graph processing with distributed immutable view 基于分布式不可变视图的高效计算和通信图处理

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600233

Rong Chen, X. Ding, Peng Wang, Haibo Chen, B. Zang, Haibing Guan

{"title":"Computation and communication efficient graph processing with distributed immutable view","authors":"Rong Chen, X. Ding, Peng Wang, Haibo Chen, B. Zang, Haibing Guan","doi":"10.1145/2600212.2600233","DOIUrl":"https://doi.org/10.1145/2600212.2600233","url":null,"abstract":"Cyclops is a new vertex-oriented graph-parallel framework for writing distributed graph analytics. Unlike existing distributed graph computation models, Cyclops retains simplicity and computation-efficiency by synchronously computing over a distributed immutable view, which grants a vertex with read-only access to all its neighboring vertices. The view is provided via read- only replication of vertices for edges spanning machines during a graph cut. Cyclops follows a centralized computation model by assigning a master vertex to update and propagate the value to its replicas unidirectionally in each iteration, which can significantly reduce messages and avoid contention on replicas. Being aware of the pervasively available multicore-based clusters, Cyclops is further extended with a hierarchical processing model, which aggregates messages and replicas in a single multicore machine and transparently decomposes each worker into multiple threads on-demand for different stages of computation. We have implemented Cyclops based on an open-source Pregel clone called Hama. Our evaluation using a set of graph algorithms on an in-house multicore cluster shows that Cyclops outperforms Hama from 2.06X to 8.69X and 5.95X to 23.04X using hash-based and Metis partition algorithms accordingly, due to the elimination of contention on messages and hierarchical optimization for the multicore-based clusters. Cyclops (written in Java) also has comparable performance with PowerGraph (written in C++) despite the language difference, due to the significantly lower number of messages and avoided contention.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"56 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133217930","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 58

MRONLINE: MapReduce online performance tuning MRONLINE: MapReduce在线性能调优

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600229

Min Li, Liangzhao Zeng, S. Meng, Jian Tan, Li Zhang, A. Butt, Nicholas C. Fuller

{"title":"MRONLINE: MapReduce online performance tuning","authors":"Min Li, Liangzhao Zeng, S. Meng, Jian Tan, Li Zhang, A. Butt, Nicholas C. Fuller","doi":"10.1145/2600212.2600229","DOIUrl":"https://doi.org/10.1145/2600212.2600229","url":null,"abstract":"MapReduce job parameter tuning is a daunting and time consuming task. The parameter configuration space is huge; there are more than 70 parameters that impact job performance. It is also difficult for users to determine suitable values for the parameters without first having a good understanding of the MapReduce application characteristics. Thus, it is a challenge to systematically explore the parameter space and select a near-optimal configuration. Extant offline tuning approaches are slow and inefficient as they entail multiple test runs and significant human effort.\u0000 To this end, we propose an online performance tuning system, MRONLINE, that monitors a job's execution, tunes associated performance-tuning parameters based on collected statistics, and provides fine-grained control over parameter configuration. MRONLINE allows each task to have a different configuration, instead of having to use the same configuration for all tasks. Moreover, we design a gray-box based smart hill climbing algorithm that can efficiently converge to a near-optimal configuration with high probability. To improve the search quality and increase convergence speed, we also incorporate a set of MapReduce-specific tuning rules in MRONLINE. Our results using a real implementation on a representative 19-node cluster show that dynamic performance tuning can effectively improve MapReduce application performance by up to 30% compared to the default configuration used in YARN.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133872940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 139

Bobolang: a language for parallel streaming applications Bobolang:用于并行流应用程序的语言

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600711

Zbynek Falt, D. Bednárek, Martin Kruliš, J. Yaghob, F. Zavoral

引用次数: 19

SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS SOR-HDFS:基于seda的方法，在rdma增强的HDFS中最大化重叠

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600715

Nusrat S. Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, D. Panda

引用次数: 31

A methodology for evaluating the impact of data compression on climate simulation data 一种评估数据压缩对气候模拟数据影响的方法

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2014-06-23 DOI: 10.1145/2600212.2600217

A. Baker, Haiying Xu, J. Dennis, M. Levy, D. Nychka, S. Mickelson, Jim Edwards, M. Vertenstein, Al Wegener

{"title":"A methodology for evaluating the impact of data compression on climate simulation data","authors":"A. Baker, Haiying Xu, J. Dennis, M. Levy, D. Nychka, S. Mickelson, Jim Edwards, M. Vertenstein, Al Wegener","doi":"10.1145/2600212.2600217","DOIUrl":"https://doi.org/10.1145/2600212.2600217","url":null,"abstract":"High-resolution climate simulations require tremendous computing resources and can generate massive datasets. At present, preserving the data from these simulations consumes vast storage resources at institutions such as the National Center for Atmospheric Research (NCAR). The historical data generation trends are economically unsustainable, and storage resources are already beginning to limit science objectives. To mitigate this problem, we investigate the use of data compression techniques on climate simulation data from the Community Earth System Model. Ultimately, to convince climate scientists to compress their simulation data, we must be able to demonstrate that the reconstructed data reveals the same mean climate as the original data, and this paper is a first step toward that goal. To that end, we develop an approach for verifying the climate data and use it to evaluate several compression algorithms. We find that the diversity of the climate data requires the individual treatment of variables, and, in doing so, the reconstructed data can fall within the natural variability of the system, while achieving compression rates of up to 5:1.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2014-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131443385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 94