{"title":"MALT","authors":"Hao Li, Asim Kadav, E. Kruus, C. Ungureanu","doi":"10.1145/2741948.2741965","DOIUrl":"https://doi.org/10.1145/2741948.2741965","url":null,"abstract":"Machine learning methods, such as SVM and neural networks, often improve their accuracy by using models with more parameters trained on large numbers of examples. Building such models on a single machine is often impractical because of the large amount of computation required. We introduce MALT, a machine learning library that integrates with existing machine learning software and provides data parallel machine learning. MALT provides abstractions for fine-grained in-memory updates using one-sided RDMA, limiting data movement costs during incremental model updates. MALT allows machine learning developers to specify the dataflow and apply communication and representation optimizations. Through its general-purpose API, MALT can be used to provide data-parallelism to existing ML applications written in C++ and Lua and based on SVM, matrix factorization and neural networks. In our results, we show MALT provides fault tolerance, network efficiency and speedup to these applications.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130645242","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Peng Huang, W. Bolosky, Abhishek Singh, Yuanyuan Zhou
{"title":"ConfValley: a systematic configuration validation framework for cloud services","authors":"Peng Huang, W. Bolosky, Abhishek Singh, Yuanyuan Zhou","doi":"10.1145/2741948.2741963","DOIUrl":"https://doi.org/10.1145/2741948.2741963","url":null,"abstract":"Studies and many incidents in the headlines suggest misconfigurations remain a major cause of unavailability in large systems despite the large amount of work put into detecting, diagnosing and repairing them. In part, this is because many of the solutions are either post-mortem or too expensive to use in production cloud-scale systems. Configuration validation is the process of explicitly defining specifications and proactively checking configurations against those specifications to prevent misconfigurations from entering production. We propose a generic framework, ConfValley, to make configuration validation easy, systematic and efficient, and to allow configuration validation as an ordinary part of system deployment. ConfValley consists of a declarative language for practitioners to express configuration specifications, an inference engine that automatically generates specifications, and a checker that determines if a given configuration obeys its specifications. Our specification language expressed the configuration validation code from Microsoft Azure in 10x fewer lines, many of which were automatically inferred. Using expert-written and inferred specifications, we detected a number of configuration errors in the latest configurations deployed in Microsoft Azure.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127966563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deriving and comparing deduplication techniques using a model-based classification","authors":"J. Kaiser, A. Brinkmann, Tim Süß, Dirk Meister","doi":"10.1145/2741948.2741952","DOIUrl":"https://doi.org/10.1145/2741948.2741952","url":null,"abstract":"Data deduplication has been a hot research topic and a large number of systems have been developed. These systems are usually seen as an inherently linked set of characteristics. However, a detailed analysis shows independent concepts that can be used in other systems. In this work, we perform this analysis on the main representatives of deduplication systems. We embed the results in a model, which shows two yet unexplored combinations of characteristics. In addition, the model enables a comprehensive evaluation of the representatives and the two new systems. We perform this evaluation based on real world data sets.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"79 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116277857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, I. Keidar
{"title":"Scaling concurrent log-structured data stores","authors":"Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, I. Keidar","doi":"10.1145/2741948.2741973","DOIUrl":"https://doi.org/10.1145/2741948.2741973","url":null,"abstract":"Log-structured data stores (LSM-DSs) are widely accepted as the state-of-the-art implementation of key-value stores. They replace random disk writes with sequential I/O, by accumulating large batches of updates in an in-memory data structure and merging it with the on-disk store in the background. While LSM-DS implementations proved to be highly successful at masking the I/O bottleneck, scaling them up on multicore CPUs remains a challenge. This is nontrivial due to their often rich APIs, as well as the need to coordinate the RAM access with the background I/O. We present cLSM, an algorithm for scalable concurrency in LSM-DS, which exploits multiprocessor-friendly data structures and non-blocking synchronization. cLSM supports a rich API, including consistent snapshot scans and general non-blocking read-modify-write operations. We implement cLSM based on the popular LevelDB key-value store, and evaluate it using intensive synthetic workloads as well as ones from production web-serving applications. Our algorithm outperforms state of the art LSM-DS implementations, improving throughput by 1.5x to 2.5x. Moreover, cLSM demonstrates superior scalability with the number of cores (successfully exploiting twice as many cores as the competition).","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127916498","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, S. Hand
{"title":"Musketeer: all for one, one for all in data processing systems","authors":"Ionel Gog, Malte Schwarzkopf, Natacha Crooks, Matthew P. Grosvenor, Allen Clement, S. Hand","doi":"10.1145/2741948.2741968","DOIUrl":"https://doi.org/10.1145/2741948.2741968","url":null,"abstract":"Many systems for the parallel processing of big data are available today. Yet, few users can tell by intuition which system, or combination of systems, is \"best\" for a given workflow. Porting workflows between systems is tedious. Hence, users become \"locked in\", despite faster or more efficient systems being available. This is a direct consequence of the tight coupling between user-facing front-ends that express workflows (e.g., Hive, SparkSQL, Lindi, GraphLINQ) and the back-end execution engines that run them (e.g., MapReduce, Spark, PowerGraph, Naiad). We argue that the ways that workflows are defined should be decoupled from the manner in which they are executed. To explore this idea, we have built Musketeer, a workflow manager which can dynamically map front-end workflow descriptions to a broad range of back-end execution engines. Our prototype maps workflows expressed in four high-level query languages to seven different popular data processing systems. Musketeer speeds up realistic workflows by up to 9x by targeting different execution engines, without requiring any manual effort. Its automatically generated back-end code comes within 5%--30% of the performance of hand-optimized implementations.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125317802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Radu Stoenescu, V. Olteanu, Matei Popovici, Mohamed Ahmed, J. Martins, R. Bifulco, Filipe Manco, Felipe Huici, Georgios Smaragdakis, M. Handley, C. Raiciu
{"title":"In-Net: in-network processing for the masses","authors":"Radu Stoenescu, V. Olteanu, Matei Popovici, Mohamed Ahmed, J. Martins, R. Bifulco, Filipe Manco, Felipe Huici, Georgios Smaragdakis, M. Handley, C. Raiciu","doi":"10.1145/2741948.2741961","DOIUrl":"https://doi.org/10.1145/2741948.2741961","url":null,"abstract":"Network Function Virtualization is pushing network operators to deploy commodity hardware that will be used to run middlebox functionality and processing on behalf of third parties: in effect, network operators are slowly but surely becoming in-network cloud providers. The market for innetwork clouds is large, ranging from content providers, mobile applications and even end-users. We show in this paper that blindly adopting cloud technologies in the context of in-network clouds is not feasible from both the security and scalability points of view. Instead we propose In-Net, an architecture that allows untrusted endpoints as well as content-providers to deploy custom in-network processing to be run on platforms owned by network operators. In-Net relies on static analysis to allow platforms to check whether the requested processing is safe, and whether it contradicts the operator's policies. We have implemented In-Net and tested it in the wide-area, supporting a range of use-cases that are difficult to deploy today. Our experience shows that In-Net is secure, scales to many users (thousands of clients on a single inexpensive server), allows for a wide-range of functionality, and offers benefits to end-users, network operators and content providers alike.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131457719","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
You Zhou, Fei Wu, Ping Huang, Xubin He, C. Xie, Jian Zhou
{"title":"An efficient page-level FTL to optimize address translation in flash memory","authors":"You Zhou, Fei Wu, Ping Huang, Xubin He, C. Xie, Jian Zhou","doi":"10.1145/2741948.2741949","DOIUrl":"https://doi.org/10.1145/2741948.2741949","url":null,"abstract":"Flash-based solid state disks (SSDs) have been very popular in consumer and enterprise storage markets due to their high performance, low energy, shock resistance, and compact sizes. However, the increasing SSD capacity imposes great pressure on performing efficient logical to physical address translation in a page-level flash translation layer (FTL). Existing schemes usually employ a built-in RAM cache for storing mapping information, called the mapping cache, to speed up the address translation. Since only a fraction of the mapping table can be cached due to limited cache space, a large number of extra operations to flash memory are required for cache management and garbage collection, degrading the performance and lifetime of an SSD. In this paper, we first apply analytical models to investigate the key factors that incur extra operations. Then, we propose an efficient page-level FTL, named TPFTL, which employs two-level LRU lists to organize cached mapping entries to minimize the extra operations. Inspired by the models, we further design a workload-adaptive loading policy combined with an efficient replacement policy to increase the cache hit ratio and reduce the writebacks of replaced dirty entries. Finally, we evaluate TPFTL using extensive trace-driven simulations. Our evaluation results show that compared to the state-of-the-art FTLs, TPFTL reduces random writes caused by address translation by an average of 62% and improves the response time by up to 24%.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"126 8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132770026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prateek Sharma, Stephen Lee, Tian Guo, David E. Irwin, P. Shenoy
{"title":"SpotCheck: designing a derivative IaaS cloud on the spot market","authors":"Prateek Sharma, Stephen Lee, Tian Guo, David E. Irwin, P. Shenoy","doi":"10.1145/2741948.2741953","DOIUrl":"https://doi.org/10.1145/2741948.2741953","url":null,"abstract":"Infrastructure-as-a-Service (IaaS) cloud platforms rent resources, in the form of virtual machines (VMs), under a variety of contract terms that offer different levels of risk and cost. For example, users may acquire VMs in the spot market that are often cheap but entail significant risk, since their price varies over time based on market supply and demand and they may terminate at any time if the price rises too high. Currently, users must manage all the risks associated with using spot servers. As a result, conventional wisdom holds that spot servers are only appropriate for delay-tolerant batch applications. In this paper, we propose a derivative cloud platform, called SpotCheck, that transparently manages the risks associated with using spot servers for users. SpotCheck provides the illusion of an IaaS platform that offers always-available VMs on demand for a cost near that of spot servers, and supports all types of applications, including interactive ones. SpotCheck's design combines the use of nested VMs with live bounded-time migration and novel server pool management policies to maximize availability, while balancing risk and cost. We implement SpotCheck on Amazon's EC2 and show that it i) provides nested VMs to users that are 99.9989% available, ii) achieves nearly 5x cost savings compared to using equivalent types of on-demand VMs, and iii) eliminates any risk of losing VM state.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133891952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abhishek Verma, Luis Pedrosa, M. Korupolu, D. Oppenheimer, Eric Tune, J. Wilkes
{"title":"Large-scale cluster management at Google with Borg","authors":"Abhishek Verma, Luis Pedrosa, M. Korupolu, D. Oppenheimer, Eric Tune, J. Wilkes","doi":"10.1145/2741948.2741964","DOIUrl":"https://doi.org/10.1145/2741948.2741964","url":null,"abstract":"Google's Borg system is a cluster manager that runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines. It achieves high utilization by combining admission control, efficient task-packing, over-commitment, and machine sharing with process-level performance isolation. It supports high-availability applications with runtime features that minimize fault-recovery time, and scheduling policies that reduce the probability of correlated failures. Borg simplifies life for its users by offering a declarative job specification language, name service integration, real-time job monitoring, and tools to analyze and simulate system behavior. We present a summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127879898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Joshua B. Leners, Trinabh Gupta, M. Aguilera, Michael Walfish
{"title":"Taming uncertainty in distributed systems with help from the network","authors":"Joshua B. Leners, Trinabh Gupta, M. Aguilera, Michael Walfish","doi":"10.1145/2741948.2741976","DOIUrl":"https://doi.org/10.1145/2741948.2741976","url":null,"abstract":"Network and process failures cause complexity in distributed applications. When a remote process does not respond, the application cannot tell if the process or network have failed, or if they are just slow. Without this information, applications can lose availability or correctness. To address this problem, we propose Albatross, a service that quickly reports to applications the current status of a remote process---whether it is working and reachable, or not. Albatross is targeted at data centers equipped with software defined networks (SDNs), allowing it to discover and enforce network partitions: Albatross borrows the old observation that it can be better to cause a problem than to live with uncertainty, and applies this idea to networks. When enforcing partitions, Albatross avoids disruption by disconnecting only individual processes (not entire hosts), and by allowing them to reconnect if the application chooses. We show that, under Albatross, distributed applications can bypass the complexity caused by network failures and that they become more available.","PeriodicalId":119291,"journal":{"name":"Proceedings of the Tenth European Conference on Computer Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2015-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124216524","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}