Meng Shen, Ke Xu, Fan Li, Kun Yang, Liehuang Zhu, Lei Guan
{"title":"Elastic and Efficient Virtual Network Provisioning for Cloud-Based Multi-tier Applications","authors":"Meng Shen, Ke Xu, Fan Li, Kun Yang, Liehuang Zhu, Lei Guan","doi":"10.1109/ICPP.2015.102","DOIUrl":"https://doi.org/10.1109/ICPP.2015.102","url":null,"abstract":"The multi-tier architecture is prevalently adopted by cloud applications, such as the three-tier web application. It is highly desirable for both tenants and providers to provide virtual networks in an efficient and elastic way, where tenant applications can automatically scale in or out with varying workloads and providers can accommodate as many requests as possible in the underlying network. However, due to potential conflicts between efficiency and elasticity, it is challenging to achieve these two goals simultaneously in abstracting tenant requirements and designing corresponding provisioning algorithms. In this paper, we propose an efficient and elastic virtual network provisioning solution called Easy Alloc, which is comprised of an elasticity-aware abstraction model and a virtual network provisioning algorithm. To accurately capture the tenant requirement and maintain the provisioning simplicity for providers, the elasticity-aware model enables two types of decoupling, i.e., Always-on VMs for normal load and on-demand VMs for dynamic scaling, and the bandwidth requirement of each VM for intra- and inter-tier communications. Then we formulate the virtual network provisioning as an overhead minimization problem, where the objective simultaneously considers the bandwidth and elasticity overhead. Due to the NP-completeness of this problem, we leverage two heuristics, slot reservation and tier iteration, to obtain an efficient algorithm. Extensive simulation results show that compared with a typical elasticity-agnostic method under a heavy load, Easy Alloc enables a 9% increase of request acceptance rate and a 16.8% improvement of the successful extension rate. To the best of our knowledge, this is the first work targeting at the elastic virtual network provisioning.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114430913","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"EC-FRM: An Erasure Coding Framework to Speed Up Reads for Erasure Coded Cloud Storage Systems","authors":"Yingxun Fu, J. Shu, Zhirong Shen","doi":"10.1109/ICPP.2015.57","DOIUrl":"https://doi.org/10.1109/ICPP.2015.57","url":null,"abstract":"With the reliability requirements increasingly important, erasure codes have been widely used in today's cloud storage systems because they achieve both high reliability and low storage overhead. However, the performance for most existing erasure codes can be further improved on both normal reads to user's data without device failures and degraded reads under device failures, which are crucial in cloud storage systems. In this paper, we propose an erasure coding framework named EC-FRM to integrate existing codes in order to improve the read performance. The constructed code over EC-FRM named EC-FRM-Code, which keeps most of wonderful properties of the integrated code and achieves good performance on both normal reads and degraded reads. We transform Reed-Solomon code and LRC code to EC-FRM-RS and EC-FRM-LRC respectively, and then conduct a series of experiments to evaluate their read performance. The results show that EC-FRM-RS code gains 19.2% to 33.9% higher normal read speed and 9.1% to 9.9% higher degraded read speed than standard Reed-Solomon code, while EC-FRM-LRC code owns 23.5% to 46.9% higher normal read speed and 3.3% to 12.8% higher degraded read speed than standard LRC code.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"24 18","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120824347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GLAF: A Visual Programming and Auto-tuning Framework for Parallel Computing","authors":"K. Krommydas, Ruchira Sasanka, Wu-chun Feng","doi":"10.1109/ICPP.2015.95","DOIUrl":"https://doi.org/10.1109/ICPP.2015.95","url":null,"abstract":"The past decade's computing revolution has delivered parallel hardware to the masses. However, the ability to exploit its capabilities and ignite scientific breakthrough at a proportionate level remains a challenge due to the lack of parallel programming expertise. Although different solutions have been proposed to facilitate harvesting the seeds of parallel computing, most target seasoned programmers and ignore the special nature of a target audience like domain experts. This paper addresses the challenge of realizing a programming abstraction and implementing an integrated development framework for this audience. We present GLAF -- a grid-based language and auto-parallelizing, auto-tuning framework. Its key elements are its intuitive visual programming interface, which attempts to render expressing and validating an algorithm easier for domain experts, and its ability to automatically generate efficient serial and parallel Fortran and C code, including potentially beneficial code modifications (e.g., With respect to data layout). We find that the above features assist novice programmers to avoid common programming pitfalls and provide fast implementations.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115519070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Buffer Management Strategy on Spray and Wait Routing Protocol in DTNs","authors":"E. Wang, Yongjian Yang, Jie Wu, Wenbin Liu","doi":"10.1109/ICPP.2015.89","DOIUrl":"https://doi.org/10.1109/ICPP.2015.89","url":null,"abstract":"Due to unpredictable node mobility and the easily-interrupted connections, routing protocols in DTNs commonly utilize multiple message copies to improve the delivery ratio. A store-carry-and-forward paradigm is also designed to assist routing messages. However, excessive message copies lead to rapid consumption of the limited storage and bandwidth. The spray and Wait routing protocol has been proposed to reduce the network overload caused by the storage and transmission of unrestricted message copies. However, there still exist congestion problems when a node's buffer is quite constrained. In this paper, we propose a message Scheduling and Drop Strategy on spray and wait Routing Protocol (SDSRP). To improve the delivery ratio, first of all, SDSRP calculates the priority of each message by evaluating the impact of both replicating and dropping a message copy on delivery ratio. Subsequently, scheduling and drop decisions are made according to the priority. Finally, we conduct extensive simulations based on synthetic and real traces in ONE. The results show that, compared with other buffer management strategies, SDSRP achieves higher delivery ratio, similar average hop counts, and lower overhead ratio.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"110 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122399774","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cloud Fog: Towards High Quality of Experience in Cloud Gaming","authors":"Yuhua Lin, Haiying Shen","doi":"10.1109/ICPP.2015.59","DOIUrl":"https://doi.org/10.1109/ICPP.2015.59","url":null,"abstract":"With the increasing popularity of Massively Multiplayer Online Game (MMOG) and fast growth of mobile gaming, cloud gaming exhibits great promises over the conventional MMOG gaming model as it frees players from the requirement of hardware and game installation on their local computers. However, as the graphics rendering is offloaded to the cloud, the data transmission between the end-users and the cloud significantly increases the response latency and limits the user coverage, thus preventing cloud gaming to achieve high user Quality of Experience (QoE). To solve this problem, previous research suggested deploying more data centers, but it comes at a prohibitive cost. We propose a lightweight system called Cloud Fog, which incorporates \"fog\" consisting of super nodes that are responsible for rendering game videos and streaming them to their nearby players. Fog enables the cloud to be only responsible for the intensive game state computation and sending update information to super nodes, which significantly reduce the traffic hence the latency and bandwidth consumption. To further enhance QoE, we propose the receiver-driven encoding rate adaptation strategy to increase the playback continuity and the deadline-driven sender buffer scheduling strategy to ensure that the segments arrive at the players within their response latency. Experimental results from Peer Sim and Planet Lab show the effectiveness and efficiency of Cloud Fog and our individual strategies in increasing user coverage, reducing response latency and bandwidth consumption.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123619876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GEM: A Framework for Developing Shared-Memory Parallel Genomic Applications on Memory Constrained Architectures","authors":"Mucahid Kutlu, G. Agrawal","doi":"10.1109/ICPP.2015.92","DOIUrl":"https://doi.org/10.1109/ICPP.2015.92","url":null,"abstract":"Amount of available genomic data is increasing rapidly with the recent developments in sequencing technologies. Analysis of such data can potentially lead significant advancements in medical research and even practice. However, it is imperative to exploit parallelism and utilize computational resources effectively to handle large scale genomic data. At the same time, the trends in computing technologies are towards architectures with large number of cores and smaller memory size per core (e.g. Intel Xeon Phi). Innovative solutions that meet the requirements of parallel genomic data processing with the constraints of the new computational architectures are urgently needed. In this work, we develop a novel middleware system, GEM, for developing shared-memory parallel genomic applications with memory constraint architectures. We propose load-map-reduce approach and a novel scheduling scheme to decrease I/O contention and prevent over-consumption of the limited memory. We also use domain specific knowledge to decrease the memory requirements of the tasks. In our experiments, we show that GEM has high scalability on Intel Xeon Phi architecture. We also compare GEM against two other frameworks for genomic data processing, GATK and PAGE, and show that our middleware outperforms both.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125473879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Pattern-Driven Hybrid Multi- and Many-Core Acceleration in the MPAS Shallow-Water Model","authors":"P. Zhang, Yulong Ao, Chao Yang, Yiqun Liu, Fangfang Liu, Changmao Wu, Haitao Zhao","doi":"10.1109/ICPP.2015.16","DOIUrl":"https://doi.org/10.1109/ICPP.2015.16","url":null,"abstract":"There is an urgent demand in studying efficient methodologies to enable hybrid multi- and many-core accelerations in global climate simulations. The Model for Prediction Across Scales (MPAS) is a family of earth-system component models that receives increasingly more attention. Like many other models, MPAS, though features some emerging numerical algorithms, employs a pure MPI approach for parallel computing, which, to date, is in lack of support for multi-threaded parallelism, especially on many-core accelerated systems. In this work, we extend the shallow-water model in MPAS to demonstrate a pattern-driven approach for hybrid multi- and many-core accelerations of climate models. We first identify all basic computation patterns through a rigorous analysis of the MPAS code. Then for the whole model, we use the identified patterns as building blocks to draw a data-flow diagram, which serves as a perfect indicator to recognize data dependencies and exploit inherent parallelism. And finally, based on the data-flow diagram, a hybrid algorithm is designed to support concurrent computations done on both multi-core CPUs and many-core accelerators. We implement the algorithm and optimize it on an x86-based heterogeneous supercomputer equipped with both Intel Xeon CPUs and Intel Xeon Phi devices. Experiments show that our hybrid design is able to deliver an 8.35x speedup as compared to the original code and scales up to 64 processes with a nearly ideal parallel efficiency.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130136778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DISCS: A DIStributed Collaboration System for Inter-AS Spoofing Defense","authors":"Bingyang Liu, J. Bi","doi":"10.1109/ICPP.2015.25","DOIUrl":"https://doi.org/10.1109/ICPP.2015.25","url":null,"abstract":"IP spoofing is prevalently used in DDoS attacks for anonymity and amplification, making them harder to prevent. Combating spoofing attacks requires the collaboration of different autonomous systems (ASes). Existing methods either lack flexibility in collaboration or require centralized control in the inter-AS environment. In this paper, we propose a Distributed Collaboration System (DISCS) for inter-AS spoofing defense, which allows ASes to flexibly collaborate in spoofing defense in a distributed manner. Each DISCS-enabled AS implements four defense functions. When a victim AS is under a spoofing attack, it can request other ASes to execute the most appropriate defense functions. We present the distributed and flexible control plane design and the backward compatible and incrementally deployable data plane design for both IPv4 and IPv6. We evaluate DISCS with theoretical proof and simulations using real Internet data. The results show that DISCS has strong deployment incentives, high effectiveness, minimal false positives, modest resource consumption and strong security.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115769206","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jee Ho Ryoo, S. Quirem, Michael LeBeane, Reena Panda, Shuang Song, L. John
{"title":"GPGPU Benchmark Suites: How Well Do They Sample the Performance Spectrum?","authors":"Jee Ho Ryoo, S. Quirem, Michael LeBeane, Reena Panda, Shuang Song, L. John","doi":"10.1109/ICPP.2015.41","DOIUrl":"https://doi.org/10.1109/ICPP.2015.41","url":null,"abstract":"Recently, GPGPUs have positioned themselves in the mainstream processor arena with their potential to perform a massive number of jobs in parallel. At the same time, many GPGPU benchmark suites have been proposed to evaluate the performance of GPGPUs. Both academia and industry have been introducing new sets of benchmarks each year while some already published benchmarks have been updated periodically. However, some benchmark suites contain benchmarks that are duplicates of each other or use the same underlying algorithm. This results in an excess of workloads in the same performance spectrum. In this paper, we provide a methodology to obtain a set of new GPGPU benchmarks that are located in the unexplored region of the performance spectrum. Our proposal uses statistical methods to understand the performance spectrum coverage and uniqueness of existing benchmark suites. Later we show techniques to identify areas that are not explored by existing benchmarks by visually showing the performance spectrum coverage. Finding unique key metrics for future benchmarks to broaden its performance spectrum coverage is also explored using hierarchical clustering and ranking by Hotel ling's T2 method. Finally, key metrics are categorized into GPGPU performance related components to show how future benchmarks can stress each of the categorized metrics to distinguish themselves in the performance spectrum. Our methodology can serve as a performance spectrum oriented guidebook for designing future GPGPU benchmarks.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115964614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyuan Shao, Ling Hou, Yang Ai, Yu Zhang, Hai Jin
{"title":"Is Your Graph Algorithm Eligible for Nondeterministic Execution?","authors":"Zhiyuan Shao, Ling Hou, Yang Ai, Yu Zhang, Hai Jin","doi":"10.1109/ICPP.2015.52","DOIUrl":"https://doi.org/10.1109/ICPP.2015.52","url":null,"abstract":"Graph algorithms are used to implement data mining tasks on graph data-sets. Besides conducting the algorithms by the default deterministic manner, some graph processing frameworks, especially those supporting asynchronous execution model, provide interfaces for the algorithms to be executed in nondeterministic manner, which can improve the scalability and performance of the algorithm's executions. However, is the graph algorithm eligible for nondeterministic execution, and will the execution produce expected results? The literature gives few answers to these questions. In this paper, we study the nondeterministic execution of graph algorithms by considering the scenario where data dependences happen in the edges in graph processing frameworks that employ asynchronous execution model. Our study reveals that only by guaranteeing the atomicity of individual reads and writes, some algorithms (e.g., Graph traversal algorithms) can converge by recovering from corrupted intermediate results with nondeterministic execution, and thus tolerate even write-write conflicts, while some other algorithms (e.g., Fixed point iteration algorithms) can converge but tolerate only read-write conflicts. By conducting graph algorithms on real-world graphs in Graph Chi, and comparing their performances and results with deterministic executions, we find that their performance gains are generally scalable to the available processors with nondeterministic executions, and the results at convergence of fixed point iteration algorithms from nondeterministic executions exhibit larger variances from one run to another than their deterministic executions.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126619516","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}