Proceedings of the 19th International Middleware Conference最新文献_第2页

Olympian 奥林匹斯山的

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274813

Yitao Hu, S. Rallapalli, B. Ko, R. Govindan

{"title":"Olympian","authors":"Yitao Hu, S. Rallapalli, B. Ko, R. Govindan","doi":"10.1145/3274808.3274813","DOIUrl":"https://doi.org/10.1145/3274808.3274813","url":null,"abstract":"Deep neural networks (DNNs) are emerging as important drivers for GPU (Graphical Processing Unit) usage. Routinely, now, cloud offerings include GPU-capable VMs, and GPUs are used for training and testing DNNs. A popular way to run inference (or testing) tasks with DNNs is to use middleware called a serving system. Tensorflow-Serving (TF-Serving) is an example of a DNN serving system. In this paper, we consider the problem of carefully scheduling multiple concurrent DNNs in a serving system on a single GPU to achieve fairness or service differentiation objectives, a capability crucial to cloud-based TF-Serving offerings. In scheduling DNNs, we face two challenges: how to schedule, and switch between, different DNN jobs at low overhead; and, how to account for their usage. Our system, Olympian, extends TF-Serving to enable fair sharing of a GPU across multiple concurrent large DNNs at low overhead, a capability TF-Serving by itself is not able to achieve. Specifically, Olympian can run concurrent instances of several large DNN models such as Inception, ResNet, GoogLeNet, AlexNet and VGG, provide each with an equal share of the GPU, while interleaving them at timescales of 1-2 ms, and incurring an overhead of less than 2%. It achieves this by leveraging the predictability of GPU computations to profile GPU resource usage models offline, then using these to achieve low overhead switching between DNNs.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116439149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Omni 泛光灯

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274821

T. Kalbarczyk, C. Julien

引用次数: 0

GeneaLog: Fine-Grained Data Streaming Provenance at the Edge GeneaLog:边缘的细粒度数据流来源

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274826

Dimitris Palyvos-Giannas, Vincenzo Gulisano, M. Papatriantafilou

{"title":"GeneaLog: Fine-Grained Data Streaming Provenance at the Edge","authors":"Dimitris Palyvos-Giannas, Vincenzo Gulisano, M. Papatriantafilou","doi":"10.1145/3274808.3274826","DOIUrl":"https://doi.org/10.1145/3274808.3274826","url":null,"abstract":"Fine-grained data provenance in data streaming allows linking each result tuple back to the source data that contributed to it, something beneficial for many applications (e.g., to find the conditions triggering a security- or safety-related alert). Further, when data transmission or storage has to be minimized, as in edge computing and cyber-physical systems, it can help in identifying the source data to be prioritized. The memory and processing costs of fine-grained data provenance, possibly afforded by high-end servers, can be prohibitive for the resource-constrained devices deployed in edge computing and cyber-physical systems. Motivated by this challenge, we present GeneaLog, a novel fine-grained data provenance technique for data streaming applications. Leveraging the logical dependencies of the data, GeneaLog takes advantage of cross-layer properties of the software stack and incurs a minimal, constant size per-tuple overhead. Furthermore, it allows for a modular and efficient algorithmic implementation using only standard data streaming operators. This is particularly useful for distributed streaming applications since the provenance processing can be executed at separate nodes, orthogonal to the data processing. We evaluate an implementation of GeneaLog using vehicular and smart grid applications, confirming it efficiently captures fine-grained provenance data with minimal overhead.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"52 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126054338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 19

All-Spark: Using Simulation Tests Directly in Production Environments to Detect System Bottlenecks in Large-Scale Systems All-Spark:直接在生产环境中使用模拟测试来检测大型系统中的系统瓶颈

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274809

Jialiang Lin, Jun Zhang, Yu Ding, Liping Zhang, Yin Han

{"title":"All-Spark: Using Simulation Tests Directly in Production Environments to Detect System Bottlenecks in Large-Scale Systems","authors":"Jialiang Lin, Jun Zhang, Yu Ding, Liping Zhang, Yin Han","doi":"10.1145/3274808.3274809","DOIUrl":"https://doi.org/10.1145/3274808.3274809","url":null,"abstract":"With the rapid growth in e-commerce, large-scale promotional activities have become a popular concept. However, when the existing system cannot be adjusted efficiently to adapt to the tremendous traffic in the promotion period, which is hundreds of times more than the volume on normal days, it be-comes a bottleneck that restricts the continuous growth of the online business. Traditional capacity prediction methods have been proven to be incapable of making accurate predictions for such special scenarios, because of a variety of unpredictable system bottlenecks. Simulation testing in a completely new test environment for such a large scale has a number of defects and limitations, such as the high cost of setting up the environment and the difficulty of testing the entire environment. Moreover, bottlenecks found in the test server may be different from those in the production server. We investigated online simulations in the production environment and built a complete simulation test system called All-Sparks. This solution solved a long-standing problem of simulation testing with large traffic in the production environment without causing any data pollution. The simulation test revealed hundreds of bottlenecks under a high workload pressure every year to eliminate the hidden problems caused by new applications. The final capacity evaluation result was deviated by less than 5% from the actual capacity, and the error rate was small (<2%); both of these are significant improvements over the traditional prediction results. This solution also provided a framework with good expansibility to multiple scenarios other than stress testing.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126703546","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

RockFS

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274817

David R. Matos, M. Pardal, Georg Carle, M. Correia

引用次数: 21

PreDict 预测

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274822

Christoph Doblander, Arash Khatayee, Hans A. Jacobsen

引用次数: 3

SpinStreams

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274814

G. Mencagli, Patrizio Dazzi, Nicolò Tonci

{"title":"SpinStreams","authors":"G. Mencagli, Patrizio Dazzi, Nicolò Tonci","doi":"10.1145/3274808.3274814","DOIUrl":"https://doi.org/10.1145/3274808.3274814","url":null,"abstract":"The ubiquity of data streams in different fields of computing has led to the emergence of Stream Processing Systems (SPSs) used to program applications that extract insights from unbounded sequences of data items. Streaming applications demand various kinds of optimizations. Most of them are aimed at increasing throughput and reducing processing latency, and need cost models used to analyze the steady-state performance by capturing complex aspects like backpressure and bottleneck detection. In those systems, the tendency is to support dynamic optimizations of running applications which, although with a substantial run-time overhead, are unavoidable in case of unpredictable workloads. As an orthogonal direction, this paper proposes SpinStreams, a static optimization tool able to leverage cost models that programmers can use to detect and understand the inefficiencies of an initial application design. SpinStreams suggests optimizations for restructuring applications by generating code to be run on the SPS. We present the theory behind our optimizations, which cover more general classes of application structures than the ones studied in the literature so far. Then, we assess the accuracy of our models in Akka, an actor-based streaming framework providing a Java and Scala API.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"516 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116220106","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 12

Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters 集群上迭代ML作业的主动同步与部分处理

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274828

Shaoqi Wang, Wei Chen, Aidi Pi, Xiaobo Zhou

{"title":"Aggressive Synchronization with Partial Processing for Iterative ML Jobs on Clusters","authors":"Shaoqi Wang, Wei Chen, Aidi Pi, Xiaobo Zhou","doi":"10.1145/3274808.3274828","DOIUrl":"https://doi.org/10.1145/3274808.3274828","url":null,"abstract":"Executing distributed machine learning (ML) jobs on Spark follows Bulk Synchronous Parallel (BSP) model, where parallel tasks execute the same iteration at the same time and the generated updates must be synchronized on parameters when all tasks are finished. However, the parallel tasks rarely have the same execution time due to sparse data so that the synchronization has to wait for tasks finished late. Moreover, running Spark on heterogeneous clusters makes it even worse because of stragglers, where the synchronization is significantly delayed by the slowest task. This paper attacks the fundamental BSP model that supports iterative ML jobs. We propose and develop a novel BSP-based Aggressive synchronization (A-BSP) model based on the convergent property of iterative ML algorithms, by allowing the algorithm to use the updates generated based on partial input data for synchronization. Specifically, when the fastest task completes, A-BSP fetches the current updates generated by the rest tasks that have partially processed their input data to push for aggressive synchronization. Furthermore, unprocessed data is prioritized for processing in the subsequent iterations to ensure algorithm convergence rate. Theoretically, we prove the algorithm convergence for gradient descent under A-BSP model. We have implemented A-BSP as a light-weight BSP-compatible mechanism in Spark and performed evaluations with various ML jobs. Experimental results show that compared to BSP, A-BSP speeds up the execution by up to 2.36x. We have also extended A-BSP onto Petuum platform and compared to the Stale Synchronous Parallel (SSP) and Asynchronous Synchronous Parallel (ASP) models. A-BSP performs better than SSP and ASP for gradient descent based jobs. It also outperforms SSP for jobs on physical heterogeneous clusters.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115082613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 15

Size Matters: Improving the Performance of Small Files in Hadoop 大小问题:提高Hadoop中小文件的性能

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274811

Salman Niazi, Mikael Ronström, Seif Haridi, J. Dowling

引用次数: 17

SPADE 铲

Proceedings of the 19th International Middleware Conference Pub Date : 2018-11-26 DOI: 10.1145/3274808.3274815

Georgios Chatzopoulos, A. Dragojevic, R. Guerraoui

{"title":"SPADE","authors":"Georgios Chatzopoulos, A. Dragojevic, R. Guerraoui","doi":"10.1145/3274808.3274815","DOIUrl":"https://doi.org/10.1145/3274808.3274815","url":null,"abstract":"Distributed transactions on modern RDMA clusters promise high throughput and low latency for scale-out workloads. As such, they can be particularly beneficial to large OLTP workloads, which require both. However, achieving good performance requires tuning the physical layout of the data store to the application and the characteristics of the underlying hardware. Manually tuning the physical design is error-prone, as well as time-consuming, and it needs to be repeated when the workload or the hardware change. In this paper we present SPADE, a physical design tuner for OLTP workloads in FaRM, a main memory distributed computing platform that leverages modern networks with RDMA capabilities. SPADE automatically decides on the partitioning of data, tunes the index and storage parameters, and selects the right mix of direct remote data accesses and function shipping to maximize performance. To achieve this, SPADE combines information derived from the workload and the schema with low-level hardware and network performance characteristics gathered through micro-benchmarks. Using SPADE, the tuned physical design achieves significant throughput and latency improvements over a manual design for two widely used OLTP benchmarks, TATP and TPC-C, sometimes using counter-intuitive tuning decisions.","PeriodicalId":167957,"journal":{"name":"Proceedings of the 19th International Middleware Conference","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114577588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5