{"title":"Leveraging endpoint flexibility in data-intensive clusters","authors":"Mosharaf Chowdhury, Srikanth Kandula, I. Stoica","doi":"10.1145/2486001.2486021","DOIUrl":"https://doi.org/10.1145/2486001.2486021","url":null,"abstract":"Many applications do not constrain the destinations of their network transfers. New opportunities emerge when such transfers contribute a large amount of network bytes. By choosing the endpoints to avoid congested links, completion times of these transfers as well as that of others without similar flexibility can be improved. In this paper, we focus on leveraging the flexibility in replica placement during writes to cluster file systems (CFSes), which account for almost half of all cross-rack traffic in data-intensive clusters. The replicas of a CFS write can be placed in any subset of machines as long as they are in multiple fault domains and ensure a balanced use of storage throughout the cluster. We study CFS interactions with the cluster network, analyze optimizations for replica placement, and propose Sinbad -- a system that identifies imbalance and adapts replica destinations to navigate around congested links. Experiments on EC2 and trace-driven simulations show that block writes complete 1.3X (respectively, 1.58X) faster as the network becomes more balanced. As a collateral benefit, end-to-end completion times of data-intensive jobs improve as well. Sinbad does so with little impact on the long-term storage balance.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114199803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trinocular: understanding internet reliability through adaptive probing","authors":"Lin Quan, J. Heidemann, Y. Pradkin","doi":"10.1145/2486001.2486017","DOIUrl":"https://doi.org/10.1145/2486001.2486017","url":null,"abstract":"Natural and human factors cause Internet outages---from big events like Hurricane Sandy in 2012 and the Egyptian Internet shutdown in Jan. 2011 to small outages every day that go unpublicized. We describe Trinocular, an outage detection system that uses active probing to understand reliability of edge networks. Trinocular is principled: deriving a simple model of the Internet that captures the information pertinent to outages, and populating that model through long-term data, and learning current network state through ICMP probes. It is parsimonious, using Bayesian inference to determine how many probes are needed. On average, each Trinocular instance sends fewer than 20 probes per hour to each /24 network block under study, increasing Internet \"background radiation\" by less than 0.7%. Trinocular is also predictable and precise: we provide known precision in outage timing and duration. Probing in rounds of 11 minutes, we detect 100% of outages one round or longer, and estimate outage duration within one-half round. Since we require little traffic, a single machine can track 3.4M /24 IPv4 blocks, all of the Internet currently suitable for analysis. We show that our approach is significantly more accurate than the best current methods, with about one-third fewer false conclusions, and about 30% greater coverage at constant accuracy. We validate our approach using controlled experiments, use Trinocular to analyze two days of Internet outages observed from three sites, and re-analyze three years of existing data to develop trends for the Internet.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114239686","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Data center networks 2","authors":"V. Padmanabhan","doi":"10.1145/3261526","DOIUrl":"https://doi.org/10.1145/3261526","url":null,"abstract":"","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"54 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133290610","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Named data networking on a router: forwarding at 20gbps and beyond","authors":"W. So, A. Narayanan, D. Oran, M. Stapp","doi":"10.1145/2486001.2491699","DOIUrl":"https://doi.org/10.1145/2486001.2491699","url":null,"abstract":"Named data networking (NDN) is a new networking paradigm using named data instead of named hosts for communication. Implementation of scalable NDN packet forwarding remains a challenge because NDN requires fast variable-length hierarchical name-based lookup, per-packet data plane state update, and large-scale forwarding tables. We have designed and implemented an NDN data plane with a software forwarding engine on an Intel Xeon-based line card in a Cisco ASR9000 router. In order to achieve high-speed forwarding, our design features (1) name lookup via hash tables with fast collision-resistant hash computation, (2) an efficient and secure FIB lookup algorithm that provides good average and bounded worst-case FIB lookup time, (3) PIT partitioning that enables linear multi-core speedup, and (4) an optimized data structure and software prefetching to maximize data cache utilization. In this demonstration, we showcase our NDN router implementation on the ASR9000 and demonstrate that it can forward real NDN traffic at 20Gbps or higher.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"125 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114310757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Virajith Jalaparti, P. Bodík, Srikanth Kandula, Ishai Menache, M. Rybalkin, Chenyun Yan
{"title":"Speeding up distributed request-response workflows","authors":"Virajith Jalaparti, P. Bodík, Srikanth Kandula, Ishai Menache, M. Rybalkin, Chenyun Yan","doi":"10.1145/2486001.2486028","DOIUrl":"https://doi.org/10.1145/2486001.2486028","url":null,"abstract":"We found that interactive services at Bing have highly variable datacenter-side processing latencies because their processing consists of many sequential stages, parallelization across 10s-1000s of servers and aggregation of responses across the network. To improve the tail latency of such services, we use a few building blocks: reissuing laggards elsewhere in the cluster, new policies to return incomplete results and speeding up laggards by giving them more resources. Combining these building blocks to reduce the overall latency is non-trivial because for the same amount of resource (e.g., number of reissues), different stages improve their latency by different amounts. We present Kwiken, a framework that takes an end-to-end view of latency improvements and costs. It decomposes the problem of minimizing latency over a general processing DAG into a manageable optimization over individual stages. Through simulations with production traces, we show sizable gains; the 99th percentile of latency improves by over 50% when just 0.1% of the responses are allowed to have partial results and by over 40% for 25% of the services when just 5% extra resources are used for reissues.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122321144","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, M. Nanduri, Roger Wattenhofer
{"title":"Achieving high utilization with software-driven WAN","authors":"C. Hong, Srikanth Kandula, Ratul Mahajan, Ming Zhang, Vijay Gill, M. Nanduri, Roger Wattenhofer","doi":"10.1145/2486001.2486012","DOIUrl":"https://doi.org/10.1145/2486001.2486012","url":null,"abstract":"We present SWAN, a system that boosts the utilization of inter-datacenter networks by centrally controlling when and how much traffic each service sends and frequently re-configuring the network's data plane to match current traffic demand. But done simplistically, these re-configurations can also cause severe, transient congestion because different switches may apply updates at different times. We develop a novel technique that leverages a small amount of scratch capacity on links to apply updates in a provably congestion-free manner, without making any assumptions about the order and timing of updates at individual switches. Further, to scale to large networks in the face of limited forwarding table capacity, SWAN greedily selects a small set of entries that can best satisfy current demand. It updates this set without disrupting traffic by leveraging a small amount of scratch capacity in forwarding tables. Experiments using a testbed prototype and data-driven simulations of two production networks show that SWAN carries 60% more traffic than the current practice.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127740968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TCP ex machina: computer-generated congestion control","authors":"Keith Winstein, H. Balakrishnan","doi":"10.1145/2486001.2486020","DOIUrl":"https://doi.org/10.1145/2486001.2486020","url":null,"abstract":"This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint's reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion-control algorithms to run at the endpoints. In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm---the control rules for the independent endpoints---that tries to achieve this objective. In simulations with ns-2, Remy-generated algorithms outperformed human-designed end-to-end techniques, including TCP Cubic, Compound, and Vegas. In many cases, Remy's algorithms also outperformed methods that require intrusive in-network changes, including XCP and Cubic-over-sfqCoDel (stochastic fair queueing with CoDel for active queue management). Remy can generate algorithms both for networks where some parameters are known tightly a priori, e.g. datacenters, and for networks where prior knowledge is less precise, such as cellular networks. We characterize the sensitivity of the resulting performance to the specificity of the prior knowledge, and the consequences when real-world conditions contradict the assumptions supplied at design-time.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121903498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Session details: Network measurement","authors":"V. Sekar","doi":"10.1145/3246331","DOIUrl":"https://doi.org/10.1145/3246331","url":null,"abstract":"","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117149050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Greedy forwarding for mobile social networks embedded in hyperbolic spaces","authors":"Jingwei Zhang","doi":"10.1145/2486001.2491728","DOIUrl":"https://doi.org/10.1145/2486001.2491728","url":null,"abstract":"In this work, we design and evaluate a novel greedy forwarding algorithm using metrics in hyperbolic spaces. Hyperbolic geometry has a natural topological reflection of scale-free networks, and greedy algorithm failed in Euclidean space becomes possible in hyperbolic one. We show that mobile social networks can be successfully embedded in such spaces, and obtains competitive performance in terms of message delivery ratio and cost. Under this result, we thus intuitively reveal the fundamental reason that why the famous BUBBLE Rap achieves the optimal performance.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125050705","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On HTTP live streaming in large enterprises","authors":"Roberto Roverso, Sameh El-Ansary, Mikael Högqvist","doi":"10.1145/2486001.2491685","DOIUrl":"https://doi.org/10.1145/2486001.2491685","url":null,"abstract":"In this work, we present a distributed caching solution which addresses the problem of efficient delivery of HTTP live streams in large private networks. With our system, we have conducted tests on a number of pilot deployments. The largest of them, with 3000 concurrent viewers, consistently showed that our system saves more than 90% of traffic towards the source of the stream while providing the same quality of user experience of a CDN. Another result is that our solution was able to reduce the load on the bottlenecks in the network by an average of 91.6%.","PeriodicalId":159374,"journal":{"name":"Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2013-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134490998","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}