Proceedings of the Eleventh European Conference on Computer Systems最新文献_第4页

Yoda: a highly available layer-7 load balancer Yoda:一个高可用的第7层负载均衡器

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901352

Rohan Gandhi, Charlie Hu, Ming Zhang

{"title":"Yoda: a highly available layer-7 load balancer","authors":"Rohan Gandhi, Charlie Hu, Ming Zhang","doi":"10.1145/2901318.2901352","DOIUrl":"https://doi.org/10.1145/2901318.2901352","url":null,"abstract":"Layer-7 load balancing is a foundational building block of online services. The lack of offerings from major public cloud providers have left online services to build their own load balancers (LB), or use third-party LB design such as HAProxy. The key problem with such proxy-based design is each proxy instance is a single point of failure, as upon its failure, the TCP flow state for the connections with the client and server is lost which breaks the user flows. This significantly affects user experience and online services revenue. In this paper, we present Yoda, a highly available, scalable and low-latency L7-LB-as-a-service in a public cloud. Yoda is based on two design principles we propose for achieving high availability of a L7 LB: decoupling the flow state from the LB instances and storing it in a persistent storage, and leveraging the L4 LB service to enable each L7 LB instance to use the virtual IP in interacting with both the client and the server (called front-and-back indirection). Our evaluation of Yoda prototype on a 60-VM testbed in Windows Azure shows the overhead of decoupling TCP state into a persistent storage is very low (<1 msec), and Yoda maintains all flows during LB instance failures, addition, removal, as well as user policy updates. Our simulation driven by a one-day trace from production online services show that compared to using Yoda by each tenant, Yoda-as-a-service reduces L7 LB instance cost for the tenants by 3.7x while providing 4x more redundancy.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"21 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88148415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

POSIX abstractions in modern operating systems: the old, the new, and the missing 现代操作系统中的POSIX抽象:旧的、新的和缺失的

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901350

Vaggelis Atlidakis, Jeremy Andrus, Roxana Geambasu, Dimitris Mitropoulos, Jason Nieh

引用次数: 40

On the capacity of thermal covert channels in multicores 多核热隐蔽信道容量研究

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901322

D. Bartolini, Philipp Miedl, L. Thiele

{"title":"On the capacity of thermal covert channels in multicores","authors":"D. Bartolini, Philipp Miedl, L. Thiele","doi":"10.1145/2901318.2901322","DOIUrl":"https://doi.org/10.1145/2901318.2901322","url":null,"abstract":"Modern multicore processors feature easily accessible temperature sensors that provide useful information for dynamic thermal management. These sensors were recently shown to be a potential security threat, since otherwise isolated applications can exploit them to establish a thermal covert channel and leak restricted information. Previous research showed experiments that document the feasibility of (low-rate) communication over this channel, but did not further analyze its fundamental characteristics. For this reason, the important questions of quantifying the channel capacity and achievable rates remain unanswered. To address these questions, we devise and exploit a new methodology that leverages both theoretical results from information theory and experimental data to study these thermal covert channels on modern multicores. We use spectral techniques to analyze data from two representative platforms and estimate the capacity of the channels from a source application to temperature sensors on the same or different cores. We estimate the capacity to be in the order of 300 bits per second (bps) for the same-core channel, i.e., when reading the temperature on the same core where the source application runs, and in the order of 50 bps for the 1-hop channel, i.e., when reading the temperature of the core physically next to the one where the source application runs. Moreover, we show a communication scheme that achieves rates of more than 45 bps on the same-core channel and more than 5 bps on the 1-hop channel, with less than 1% error probability. The highest rate shown in previous work was 1.33 bps on the 1-hop channel with 11% error probability.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"46 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90266094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 62

HAFT: hardware-assisted fault tolerance 硬件辅助容错

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901339

Dmitrii Kuvaiskii, Rasha Faqeh, Pramod Bhatotia, P. Felber, C. Fetzer

引用次数: 51

NChecker: saving mobile app developers from network disruptions NChecker:将移动应用开发者从网络中断中解救出来

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901353

Xinxin Jin, Peng Huang, Tianyin Xu, Yuanyuan Zhou

{"title":"NChecker: saving mobile app developers from network disruptions","authors":"Xinxin Jin, Peng Huang, Tianyin Xu, Yuanyuan Zhou","doi":"10.1145/2901318.2901353","DOIUrl":"https://doi.org/10.1145/2901318.2901353","url":null,"abstract":"Most of today's mobile apps rely on the underlying networks to deliver key functions such as web browsing, file synchronization, and social networking. Compared to desktop-based networks, mobile networks are much more dynamic with frequent connectivity disruptions, network type switches, and quality changes, posing unique programming challenges for mobile app developers. As revealed in this paper, many mobile app developers fail to handle these intermittent network conditions in the mobile network programming. Consequently, network programming defects (NPDs) are pervasive in mobile apps, causing bad user experiences such as crashes, data loss, etc. Despite the development of network libraries in the hope of lifting the developers' burden, we observe that many app developers fail to use these libraries properly and still introduce NPDs. In this paper, we study the characteristics of the real-world NPDs in Android apps towards a deep understanding of their impacts, root causes, and code patterns. Driven by the study, we build NChecker, a practical tool to detect NPDs by statically analyzing Android app binaries. NChecker has been applied to hundreds of real Android apps and detected 4180 NPDs from 285 randomly-selected apps with a 94+% accuracy. Our further analysis of these defects reveals the common mistakes of app developers in working with the existing network libraries' abstractions, which provide insights for improving the usability of mobile network libraries.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"28 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73731924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

STRADS: a distributed framework for scheduled model parallel machine learning STRADS:调度模型并行机器学习的分布式框架

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901331

Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, E. Xing

{"title":"STRADS: a distributed framework for scheduled model parallel machine learning","authors":"Jin Kyu Kim, Qirong Ho, Seunghak Lee, Xun Zheng, Wei Dai, Garth A. Gibson, E. Xing","doi":"10.1145/2901318.2901331","DOIUrl":"https://doi.org/10.1145/2901318.2901331","url":null,"abstract":"Machine learning (ML) algorithms are commonly applied to big data, using distributed systems that partition the data across machines and allow each machine to read and update all ML model parameters --- a strategy known as data parallelism. An alternative and complimentary strategy, model parallelism, partitions the model parameters for non-shared parallel access and updates, and may periodically repartition the parameters to facilitate communication. Model parallelism is motivated by two challenges that data-parallelism does not usually address: (1) parameters may be dependent, thus naive concurrent updates can introduce errors that slow convergence or even cause algorithm failure; (2) model parameters converge at different rates, thus a small subset of parameters can bottleneck ML algorithm completion. We propose scheduled model parallelism (SchMP), a programming approach that improves ML algorithm convergence speed by efficiently scheduling parameter updates, taking into account parameter dependencies and uneven convergence. To support SchMP at scale, we develop a distributed framework STRADS which optimizes the throughput of SchMP programs, and benchmark four common ML applications written as SchMP programs: LDA topic modeling, matrix factorization, sparse least-squares (Lasso) regression and sparse logistic regression. By improving ML progress per iteration through SchMP programming whilst improving iteration throughput through STRADS we show that SchMP programs running on STRADS outperform non-model-parallel ML implementations: for example, SchMP LDA and SchMP Lasso respectively achieve 10x and 5x faster convergence than recent, well-established baselines.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72985651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 80

TetriSched: global rescheduling with adaptive plan-ahead in dynamic heterogeneous clusters TetriSched:动态异构集群中具有自适应计划提前的全局重调度

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901355

Alexey Tumanov, T. Zhu, J. Park, M. Kozuch, Mor Harchol-Balter, G. Ganger

引用次数: 176

Optimizing distributed actor systems for dynamic interactive services 为动态交互服务优化分布式参与者系统

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901343

Andrew Newell, G. Kliot, Ishai Menache, Aditya Gopalan, Soramichi Akiyama, M. Silberstein

{"title":"Optimizing distributed actor systems for dynamic interactive services","authors":"Andrew Newell, G. Kliot, Ishai Menache, Aditya Gopalan, Soramichi Akiyama, M. Silberstein","doi":"10.1145/2901318.2901343","DOIUrl":"https://doi.org/10.1145/2901318.2901343","url":null,"abstract":"Distributed actor systems are widely used for developing interactive scalable cloud services, such as social networks and on-line games. By modeling an application as a dynamic set of lightweight communicating \"actors\", developers can easily build complex distributed applications, while the underlying runtime system deals with low-level complexities of a distributed environment. We present ActOp---a data-driven, application-independent runtime mechanism for optimizing end-to-end service latency of actor-based distributed applications. ActOp targets the two dominant factors affecting latency: the overhead of remote inter-actor communications across servers, and the intra-server queuing delay. ActOp automatically identifies frequently communicating actors and migrates them to the same server transparently to the running application. The migration decisions are driven by a novel scalable distributed graph partitioning algorithm which does not rely on a single server to store the whole communication graph, thereby enabling efficient actor placement even for applications with rapidly changing graphs (e.g., chat services). Further, each server autonomously reduces the queuing delay by learning an internal queuing model and configuring threads according to instantaneous request rate and application demands. We prototype ActOp by integrating it with Orleans -- a popular open-source actor system [4, 13]. Experiments with realistic workloads show latency improvements of up to 75% for the 99th percentile, up to 63% for the mean, with up to 2x increase in peak system throughput.","PeriodicalId":20737,"journal":{"name":"Proceedings of the Eleventh European Conference on Computer Systems","volume":"79 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2016-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74218158","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 23

Flash storage disaggregation 闪存分解

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901337

Ana Klimovic, C. Kozyrakis, Eno Thereska, Binu John, Sanjeev Kumar

引用次数: 123

Juggler: a practical reordering resilient network stack for datacenters 用于数据中心的实用的重新排序弹性网络堆栈

Proceedings of the Eleventh European Conference on Computer Systems Pub Date : 2016-04-18 DOI: 10.1145/2901318.2901334

Yilong Geng, V. Jeyakumar, A. Kabbani, Mohammad Alizadeh

引用次数: 40