Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM最新文献_第8页

Leveraging endpoint flexibility in data-intensive clusters 在数据密集型集群中利用端点灵活性

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2486021

Mosharaf Chowdhury, Srikanth Kandula, I. Stoica

{"title":"Leveraging endpoint flexibility in data-intensive clusters","authors":"Mosharaf Chowdhury, Srikanth Kandula, I. Stoica","doi":"10.1145/2486001.2486021","DOIUrl":"https://doi.org/10.1145/2486001.2486021","url":null,"abstract":"Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster. We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114199803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 168

Trinocular: understanding internet reliability through adaptive probing 三位一体:通过自适应探测理解互联网可靠性

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2486017

Lin Quan, J. Heidemann, Y. Pradkin

{"title":"Trinocular: understanding internet reliability through adaptive probing","authors":"Lin Quan, J. Heidemann, Y. Pradkin","doi":"10.1145/2486001.2486017","DOIUrl":"https://doi.org/10.1145/2486001.2486017","url":null,"abstract":"Natural and human factors cause Internet outages---from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses active probing to understand reliability of edge networks. Trinocular is principled: deriving a simple model of the Internet that captures the information pertinent to outages, and populating that model through long-term data, and learning current network state through ICMP probes. It is parsimonious, using Bayesian inference to determine how many probes are needed. On average, each Trinocular instance sends fewer than 20 probes per hour to each /24 network block under study, increasing Internet \"background radiation\" by less than 0.7%. Trinocular is also predictable and precise: we provide known precision in outage timing and duration. Probing in rounds of 11 minutes, we detect 100% of outages one round or longer, and estimate outage duration within one-half round. Since we require little traffic, a single machine can track 3.4M /24 IPv4 blocks, all of the Internet currently suitable for analysis. We show that our approach is significantly more accurate than the best current methods, with about one-third fewer false conclusions, and about 30% greater coverage at constant accuracy. We validate our approach using controlled experiments, use Trinocular to analyze two days of Internet outages observed from three sites, and re-analyze three years of existing data to develop trends for the Internet.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114239686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 118

Session details: Data center networks 2 会话详细信息:数据中心网络

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/3261526

V. Padmanabhan

引用次数: 0

Session details: Network measurement 会话细节:网络测量

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/3246331

V. Sekar

引用次数: 0

Greedy forwarding for mobile social networks embedded in hyperbolic spaces 嵌入双曲空间的移动社交网络贪婪转发

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2491728

Jingwei Zhang

引用次数: 7

TCP ex machina: computer-generated congestion control TCP ex machina:计算机生成的拥塞控制

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2486020

Keith Winstein, H. Balakrishnan

{"title":"TCP ex machina: computer-generated congestion control","authors":"Keith Winstein, H. Balakrishnan","doi":"10.1145/2486001.2486020","DOIUrl":"https://doi.org/10.1145/2486001.2486020","url":null,"abstract":"This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint's reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion-control algorithms to run at the endpoints. In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm---the control rules for the independent endpoints---that tries to achieve this objective. In simulations with ns-2, Remy-generated algorithms outperformed human-designed end-to-end techniques, including TCP Cubic, Compound, and Vegas. In many cases, Remy's algorithms also outperformed methods that require intrusive in-network changes, including XCP and Cubic-over-sfqCoDel (stochastic fair queueing with CoDel for active queue management). Remy can generate algorithms both for networks where some parameters are known tightly a priori, e.g. datacenters, and for networks where prior knowledge is less precise, such as cellular networks. We characterize the sensitivity of the resulting performance to the specificity of the prior knowledge, and the consequences when real-world conditions contradict the assumptions supplied at design-time.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121903498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 442

Named data networking on a router: forwarding at 20gbps and beyond 路由器上的命名数据网络:以20gbps或更高的速度转发

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2491699

W. So, A. Narayanan, D. Oran, M. Stapp

引用次数: 42

ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing ElasticSwitch:为云计算提供实用的节省工作量的带宽保证

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2486027

L. Popa, P. Yalagandula, S. Banerjee, J. Mogul, Yoshio Turner, J. R. Santos

{"title":"ElasticSwitch: practical work-conserving bandwidth guarantees for cloud computing","authors":"L. Popa, P. Yalagandula, S. Banerjee, J. Mogul, Yoshio Turner, J. R. Santos","doi":"10.1145/2486001.2486027","DOIUrl":"https://doi.org/10.1145/2486001.2486027","url":null,"abstract":"While cloud computing providers offer guaranteed allocations for resources such as CPU and memory, they do not offer any guarantees for network resources. The lack of network guarantees prevents tenants from predicting lower bounds on the performance of their applications. The research community has recognized this limitation but, unfortunately, prior solutions have significant limitations: either they are inefficient, because they are not work-conserving, or they are impractical, because they require expensive switch support or congestion-free network cores. In this paper, we propose ElasticSwitch, an efficient and practical approach for providing bandwidth guarantees. ElasticSwitch is efficient because it utilizes the spare bandwidth from unreserved capacity or underutilized reservations. ElasticSwitch is practical because it can be fully implemented in hypervisors, without requiring a specific topology or any support from switches. Because hypervisors operate mostly independently, there is no need for complex coordination between them or with a central controller. Our experiments, with a prototype implementation on a 100-server testbed, demonstrate that ElasticSwitch provides bandwidth guarantees and is work-conserving, even in challenging situations.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"166 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131212551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 205

Speeding up distributed request-response workflows 加速分布式请求-响应工作流

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2486028

Virajith Jalaparti, P. Bodík, Srikanth Kandula, Ishai Menache, M. Rybalkin, Chenyun Yan

{"title":"Speeding up distributed request-response workflows","authors":"Virajith Jalaparti, P. Bodík, Srikanth Kandula, Ishai Menache, M. Rybalkin, Chenyun Yan","doi":"10.1145/2486001.2486028","DOIUrl":"https://doi.org/10.1145/2486001.2486028","url":null,"abstract":"We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 146

Achieving high utilization with software-driven WAN 通过软件驱动的广域网实现高利用率

Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM Pub Date : 2013-08-12 DOI: 10.1145/2486001.2486012

C. Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, M. Nanduri, Roger Wattenhofer

引用次数: 1113