2015 44th International Conference on Parallel Processing最新文献_第9页

Slowing Little Quickens More: Improving DCTCP for Massive Concurrent Flows 慢一点加速更多:改进大规模并发流的DCTCP

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.78

Mao Miao, Peng Cheng, Fengyuan Ren, Ran Shu

{"title":"Slowing Little Quickens More: Improving DCTCP for Massive Concurrent Flows","authors":"Mao Miao, Peng Cheng, Fengyuan Ren, Ran Shu","doi":"10.1109/ICPP.2015.78","DOIUrl":"https://doi.org/10.1109/ICPP.2015.78","url":null,"abstract":"DCTCP is a potential TCP replacement to satisfy the requirements of data center network. It receives wide concerns in both academic and industrial circles. However, DCTCP could only support tens of concurrent flows well and suffers timeouts and throughput collapse facing numerous concurrent flows. This is far from the requirement of data center network. Data centers employing partition/aggregation pattern usually involve hundreds of concurrent flows. In this paper, after tracing DCTCP's dynamic behavior through experiments, we explored two roots for DCTCP's failure under the high fan-in traffic pattern: (1) The regulation mechanism of sending window is ineffective when cwnd is decreased to the minimum size, (2) The bursts induced by synchronized flows with small cwnd cause fatal packet loss leading to severe timeouts. We enhance DCTCP to support massive concurrent flows by regulating the sending time interval and desynchronizing the sending time in particular conditions. The new protocol called DCTCP+ outperforms DCTCP when the number of concurrent flows increases to several hundreds. DCTCP+ can normally work to effectively support the short concurrent query responses in the benchmark from real production clusters, and keep the same good performance with the mixture of background traffic.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123813600","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

MIFO: Multi-path Interdomain Forwarding MIFO:多路径域间转发

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.27

Ming Zhu, Dan Li, Y. Liu, Dan Pei, K. Ramakrishnan, Lili Liu, Jianping Wu

{"title":"MIFO: Multi-path Interdomain Forwarding","authors":"Ming Zhu, Dan Li, Y. Liu, Dan Pei, K. Ramakrishnan, Lili Liu, Jianping Wu","doi":"10.1109/ICPP.2015.27","DOIUrl":"https://doi.org/10.1109/ICPP.2015.27","url":null,"abstract":"Today's interdomain routing is traffic agnostic when determining the single, best forwarding path. Naturally, as it does not adapt to congestion, the path chosen is not always optimal. In this paper, we focus on designing a multi-path interdomain forwarding (MIFO) mechanism, where AS border routers adaptively forward outbound traffic from a congested default path to an alternative path, without touching the interdomain routing protocols. Different from previous efforts which enable multi-path on control plane, MIFO achieves multi-path on data plane. The multiple alternative forwarding paths are obtained by exploring local BGP RIB. Multi-path forwarding on data plane can create a loop even within a stable network. MIFO solves this problem with a simple and practical approach. Several other challenges are also addressed including preventing cycling packet between iBGP peers and choosing the best alternative path from among multiple candidates. Our evaluations show that MIFO significantly improves the end-to-end throughput at the AS level, compared to traditional BGP and MIRO. For example, with only 50% of the ASes being MIFO capable, a significant percentage of the flows (about 40%) can use at least 50% of the inter-AS link capacity. In contrast, BGP and MIRO routing make less effective use of the inter-AS links, with only 7% and 17% of the flows can be so. Finally, we have developed a prototype implementation of MIFO on Linux with the forwarding engine in the kernel, with the routing daemon developed on XORP platform. The experiments on a test bed built with prototypes show that MIFO can improves the aggregate throughput by 81% compared with BGP routing.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"75 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123190059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

On Maximizing Reliability of Lifetime Constrained Data Aggregation Tree in Wireless Sensor Networks 无线传感器网络中寿命约束数据聚合树可靠性最大化研究

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.17

M. Shan, Guihai Chen, Fan Wu, Xiaobing Wu, Xiaofeng Gao, Pan Wu, Haipeng Dai

引用次数: 4

Accelerating Spectral Calculation through Hybrid GPU-Based Computing 基于混合gpu的计算加速频谱计算

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.13

Jian Xiao, Xingyu Xu, Ce Yu, Jiawan Zhang, Shuinai Zhang, Li Ji, Ji-zhou Sun

引用次数: 0

Privacy Preserving Market Schemes for Mobile Sensing 保护私隐的流动感应市场计划

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.100

Yuan Zhang, Yunlong Mao, He Zhang, Sheng Zhong

引用次数: 8

PDTL: Parallel and Distributed Triangle Listing for Massive Graphs 海量图的并行和分布式三角形列表

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.46

Ilias Giechaskiel, G. Panagopoulos, Eiko Yoneki

{"title":"PDTL: Parallel and Distributed Triangle Listing for Massive Graphs","authors":"Ilias Giechaskiel, G. Panagopoulos, Eiko Yoneki","doi":"10.1109/ICPP.2015.46","DOIUrl":"https://doi.org/10.1109/ICPP.2015.46","url":null,"abstract":"This paper presents the first distributed triangle listing algorithm with provable CPU, I/O, Memory, and Network bounds. Finding all triangles (3-cliques) in a graph has numerous applications for density and connectivity metrics, but the majority of existing algorithms for massive graphs are sequential, while distributed versions of algorithms do not guarantee their CPU, I/O, Memory, or Network requirements. Our Parallel and Distributed Triangle Listing (PDTL) framework focuses on efficient external-memory access in distributed environments instead of fitting sub graphs into memory. It works by performing efficient orientation and load-balancing steps, and replicating graphs across machines by using an extended version of Hu et al.'s Massive Graph Triangulation algorithm. PDTL suits a variety of computational environments, from single-core machines to high-end clusters, and computes the exact triangle count on graphs of over 6B edges and 1B vertices (e.g. Yahoo graphs), outperforming and using fewer resources than the state-of-the-art systems Power Graph, OPT, and PATRIC by 2x to 4x. Our approach thus highlights the importance of I/O in a distributed environment.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"126 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131918010","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 30

Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations GPU上的嵌套并行:探索不规则循环和递归计算的并行化模板

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.107

Da Li, Hancheng Wu, M. Becchi

{"title":"Nested Parallelism on GPU: Exploring Parallelization Templates for Irregular Loops and Recursive Computations","authors":"Da Li, Hancheng Wu, M. Becchi","doi":"10.1109/ICPP.2015.107","DOIUrl":"https://doi.org/10.1109/ICPP.2015.107","url":null,"abstract":"The effective deployment of applications exhibiting irregular nested parallelism on GPUs is still an open problem. A naïve mapping of irregular code onto the GPU hardware often leads to resource underutilization and, thereby, limited performance. In this work, we focus on two computational patterns exhibiting nested parallelism: irregular nested loops and parallel recursive computations. In particular, we focus on recursive algorithms operating on trees and graphs. We propose different parallelization templates aimed to increase the GPU utilization of these codes. Specifically, we investigate mechanisms to effectively distribute irregular work to streaming multiprocessors and GPU cores. Some of our parallelization templates rely on dynamic parallelism, a feature recently introduced by Nvidia in their Kepler GPUs and announced as part of the Open CL 2.0 standard. We propose mechanisms to maximize the work performed by nested kernels and minimize the overhead due to their invocation. Our results show that the use of our parallelization templates on applications with irregular nested loops can lead to a 2-6x speedup over baseline GPU codes that do not include load balancing mechanisms. The use of nested parallelism-based parallelization templates on recursive tree traversal algorithms can lead to substantial speedups (up to 15-24x) over optimized CPU implementations. However, the benefits of nested parallelism are still unclear in the presence of recursive applications operating on graphs, especially when recursive code variants require expensive synchronization. In these cases, a flat parallelization of iterative versions of the considered algorithms may be preferable.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132337818","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 16

Parallel (Probable) Lock-Free Hash Sieve: A Practical Sieving Algorithm for the SVP 并行(可能)无锁散列筛:一种实用的SVP筛分算法

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.68

Artur Mariano, C. Bischof, Thijs Laarhoven

引用次数: 36

Optimization of Resource Allocation and Energy Efficiency in Heterogeneous Cloud Data Centers 异构云数据中心的资源配置与能效优化

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.9

Amer Qouneh, Ming Liu, Tao Li

{"title":"Optimization of Resource Allocation and Energy Efficiency in Heterogeneous Cloud Data Centers","authors":"Amer Qouneh, Ming Liu, Tao Li","doi":"10.1109/ICPP.2015.9","DOIUrl":"https://doi.org/10.1109/ICPP.2015.9","url":null,"abstract":"Performance and energy efficiency are major concerns in cloud computing data centers. More often, they carry conflicting requirements making optimization a challenge. Further complications arise when heterogeneous hardware and data center management technologies are combined. For example, heterogeneous hardware such as General Purpose Graphics Processing Units (GPGPUs) improve performance at the cost of greater power consumption while virtualization technologies improve resource management and utilization at the cost of degraded performance. In this paper, we focus on exploiting heterogeneity introduced by GPUs to reduce power budget requirements for servers while maintaining performance. To maintain or improve overall server performance at reduced power budget, we propose two enhancements: (a) We borrow power from co-located multithreaded virtual machines (VMs) and reallocate it to GPU VMs. (b) To compensate multi-threaded VMs and re-boost their performance, we propose to borrow virtual computing resources from GPU VMs and reallocate them to CPU VMs. Combining the two techniques minimizes server power budget while maintaining overall server performance. Our results show that server power budget can be reduced by almost 18% at the average cost of 13% performance degradation per virtual machine. In addition, reallocating virtual resources improves the performance of multi-threaded applications by 30% without affecting GPU applications. Combining both techniques reduces server energy consumption by 47 % with minimum performance degradation.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"172 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132662166","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Joint Media Streaming Optimization of Energy and Rebuffering Time in Cellular Networks 蜂窝网络中能量和再缓冲时间的联合流媒体优化

2015 44th International Conference on Parallel Processing Pub Date : 2015-09-01 DOI: 10.1109/ICPP.2015.49

Zeqi Lai, Yong Cui, Yayun Bao, Jiangchuan Liu, Yingchao Zhao, Xiao Ma

{"title":"Joint Media Streaming Optimization of Energy and Rebuffering Time in Cellular Networks","authors":"Zeqi Lai, Yong Cui, Yayun Bao, Jiangchuan Liu, Yingchao Zhao, Xiao Ma","doi":"10.1109/ICPP.2015.49","DOIUrl":"https://doi.org/10.1109/ICPP.2015.49","url":null,"abstract":"Streaming services are gaining popularity and have contributed a tremendous fraction of today's cellular network traffic. Both playback fluency and battery endurance are significant performance metrics for mobile streaming services. However, because of the unpredictable network condition and the loose coupling between upper layer streaming protocols and underlying network configurations, jointly optimizing rebuffering time and energy consumption for mobile streaming services remains a significant challenge. In this paper, we propose a novel framework that effectively addresses the above limitations and optimizes video transmission in cellular networks. We design two complementary algorithms, Rebuffering Time Minimization Algorithm (RTMA) and Energy Minimization Algorithm (EMA) in this framework, to achieve smoothed playback and energy-efficiency on demand over multi-user scenarios. Our algorithms integrate cross-layer parameters to schedule video delivery. Specifically, RTMA aims at achieving the minimum rebuffering time with limited energy and EMA tries to obtain the minimum energy consumption while meeting the rebuffering time constraint. Extensive simulation demonstrates that RTMA is able to reduce at least 68% rebuffering time and EMA can achieve more than 27% energy reduction compared with other state-of-the-art solutions.","PeriodicalId":423007,"journal":{"name":"2015 44th International Conference on Parallel Processing","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114832394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1