Dimitris Palyvos-Giannas, G. Mencagli, M. Papatriantafilou, Vincenzo Gulisano
{"title":"Lachesis: a middleware for customizing OS scheduling of stream processing queries","authors":"Dimitris Palyvos-Giannas, G. Mencagli, M. Papatriantafilou, Vincenzo Gulisano","doi":"10.1145/3464298.3493407","DOIUrl":"https://doi.org/10.1145/3464298.3493407","url":null,"abstract":"Data streaming applications in Cyber-Physical Systems enable high-throughput, low-latency transformations of raw data into value. The performance of such applications, run by Stream Processing Engines (SPEs), can be boosted through custom CPU scheduling. Previous schedulers in the literature require alterations to SPEs to control the scheduling through user-level threads. While such alterations allow for fine-grained control, they hinder the adoption of such schedulers due to the high implementation cost and potential limitations in application semantics (e.g., blocking I/O). Motivated by the above, we explore the feasibility and benefits of custom scheduling without alterations to SPEs but, instead, by orchestrating the OS scheduler (e.g., using nice and cgroup) to enforce the scheduling goals. We propose Lachesis, a standalone scheduling middleware, decoupled from any specific SPE, that can schedule multiple streaming applications, run in one or many nodes, and possibly multiple SPEs. Our evaluation with real-world and synthetic workloads, several SPEs and hardware setups, shows its benefits over default OS scheduling and other state-of-the-art schedulers: up to 75% higher throughput, and 1130x lower average latency once such SPEs reach their peak processing capacity.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131827064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tânia Esteves, Francisco Neves, Rui Oliveira, J. Paulo
{"title":"CAT: content-aware tracing and analysis for distributed systems","authors":"Tânia Esteves, Francisco Neves, Rui Oliveira, J. Paulo","doi":"10.1145/3464298.3493396","DOIUrl":"https://doi.org/10.1145/3464298.3493396","url":null,"abstract":"Tracing and analyzing the interactions and exchanges between nodes is fundamental to uncover performance, correctness and dependability issues almost unavoidable in any complex distributed system. Existing monitoring tools acknowledge this importance but, so far, restrict tracing to the external attributes of I/O messages, thus missing a wealth of information in them. We present CaT, a non-intrusive content-aware tracing and analysis framework that, through a novel similarity-based approach, is able to comprehensively trace and correlate the flow of network and storage requests from applications. By supporting multiple tracing tools, CaT can balance the coverage of captured events with the impact on applications' performance. The conducted experimental evaluation considering two widely used applications (TensorFlow and Apache Hadoop) shows how CaT can improve the analysis of distributed systems. The results also exemplify the trade-offs that can be used to balance tracing coverage and performance impact. Interestingly, in certain cases, full coverage of events can be attained with negligible performance and storage overhead.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"77 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123942977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiaofeng Wu, J. Rao, Wei Chen, Hang Huang, Chris H. Q. Ding, Heng Huang
{"title":"SwitchFlow","authors":"Xiaofeng Wu, J. Rao, Wei Chen, Hang Huang, Chris H. Q. Ding, Heng Huang","doi":"10.1145/3464298.3493391","DOIUrl":"https://doi.org/10.1145/3464298.3493391","url":null,"abstract":"Accelerators, such as GPU, are a scarce resource in deep learning (DL). Effectively and efficiently sharing GPU leads to improved hardware utilization as well as user experiences, who may need to wait for hours to access GPU before a long training job is done. Spatial and temporal multitasking on GPU have been studied in the literature, but popular deep learning frameworks, such as Tensor-Flow and PyTorch, lack the support of GPU sharing among multiple DL models, which are typically represented as computation graphs, heavily optimized by underlying DL libraries, and run on a complex pipeline spanning CPU and GPU. Our study shows that GPU kernels, spawned from computation graphs, can barely execute simultaneously on a single GPU and time slicing may lead to low GPU utilization. This paper presents SwitchFlow, a scheduling framework for DL multitasking. It centers on two designs. First, instead of scheduling a computation graph as a whole, SwitchFlow schedules its subgraphs and prevents subgraphs from different models to run simultaneously on a GPU. This results in less interference and the elimination of out-of-memory errors. Moreover, subgraphs running on different devices can overlap with each other, leading to a more efficient execution pipeline. Second, SwitchFlow maintains multiple versions of each subgraph. This allows subgraphs to be migrated across devices at a low cost, thereby enabling low-latency preemption. Results on representative DL models show that SwitchFlow achieves up to an order of magnitude lower tail latency for inference requests collocated with a training job.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126153764","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Highly-available and consistent group collaboration at the edge with colony","authors":"Ilyas Toumlilt, P. Sutra, M. Shapiro","doi":"10.1145/3464298.3493405","DOIUrl":"https://doi.org/10.1145/3464298.3493405","url":null,"abstract":"Edge applications, such as gaming, cooperative engineering, or in-the-field information sharing, enjoy immediate response, autonomy and availability by distributing and replicating data at the edge. However, application developers and users demand the highest possible consistency guarantees, and specific support for group collaboration. To address this challenge, Colony guarantees Transactional Causal Plus Consistency (TCC+) globally, strengthened to Snapshot Isolation within edge groups. To help with scalability, fault tolerance and security, its logical communication topology is forest-like, with replicated roots in the core cloud, but with the flexibility to migrate a node or a group. Despite this hybrid approach, applications enjoy the same semantics everywhere in the topology. Our experiments show that local caching and peer groups improve throughput and response time significantly, performance is not affected in offline mode, and that migration is seamless.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129947935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniela Cason, Nenad Milosevic, Zarko Milosevic, F. Pedone
{"title":"Gossip consensus","authors":"Daniela Cason, Nenad Milosevic, Zarko Milosevic, F. Pedone","doi":"10.1145/3464298.3493395","DOIUrl":"https://doi.org/10.1145/3464298.3493395","url":null,"abstract":"Gossip-based consensus protocols have been recently proposed to confront the challenges faced by state machine replication in large geographically distributed systems. It is unclear, however, to which extent consensus and gossip communication fit together. On the one hand, gossip communication has been shown to scale to large settings and efficiently handle participant failures and message losses. On the other hand, gossip may slow down consensus. Moreover, gossip's inherent redundancy may be unnecessary since consensus naturally accounts for participant failures and message losses. This paper investigates the suitability of gossip as a communication building block for consensus. We answer three questions: How much overhead does classic gossip introduce in consensus? Can we design consensus-friendly gossip protocols? Would more efficient gossip protocols still maintain the same reliability properties of classic gossip?","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116801303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Taras Lykhenko, João Rafael Pinto Soares, Luís Rodrigues
{"title":"FaaSTCC: efficient transactional causal consistency for serverless computing","authors":"Taras Lykhenko, João Rafael Pinto Soares, Luís Rodrigues","doi":"10.1145/3464298.3493392","DOIUrl":"https://doi.org/10.1145/3464298.3493392","url":null,"abstract":"In this paper we study mechanisms that permit to augment the FaaS middleware with support for Transactional Causal Consistency (TCC). At first glance, it may seem that offering TCC to FaaS applications can trivially be achieved, given that the FaaS paadigm does not prevent applications from selecting the storage service with the properties they need. Unfortunately, most TCC storage services ensure consistency only to individual client processes, while a FaaS application is executed by multiple, independent, worker processes. Therefore, there is the need to coordinate the workers, a task that can be a significant source of overhead. We propose a novel architecture to support TCC in FaaS, named FaaSTCC, that significantly reduces the coordination overhead. FaaSTCC achieves this goal by augmenting the workers with a caching layer and by implementing novel mechanisms that maximize the cache usage. First, our storage layer offers to the caching layer a promise, that sets a horizon where the versions retrieved by the cache are guaranteed to be consistent. Second, in FaaSTCC, functions coordinate using snapshot intervals, that support the lazy identification of the read snapshot, increasing the chances of using the cached values. We have implemented and experimentally evaluated FaaSTCC. Our results show that FaaSTCC achieves up to 5x lower average latency and 6x lower tail latency than previous work.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115703481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Long Hoang Le, Mojtaba Eslahi-Kelorazi, Paulo R. Coelho, Fernando Pedone
{"title":"RamCast","authors":"Long Hoang Le, Mojtaba Eslahi-Kelorazi, Paulo R. Coelho, Fernando Pedone","doi":"10.1145/3464298.3493393","DOIUrl":"https://doi.org/10.1145/3464298.3493393","url":null,"abstract":"Atomic multicast is a group communication abstraction useful in the design of highly available and scalable systems. It allows messages to be addressed to a subset of the processes in the system reliably and consistently. Many atomic multicast algorithms have been designed for the message-passing system model. The paper presents RamCast, the first atomic multicast protocol for the shared-memory system model. We design RamCast by leveraging Remote Direct Memory Access (RDMA) technology and by carefully combining techniques from message-passing and shared-memory systems. We show experimentally that RamCast outperforms current state-of-the-art atomic multicast protocols, increasing throughput by up to 3.7x and reducing latency by up to 28x.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114487697","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Experience Paper: Towards enhancing cost efficiency in serverless machine learning training","authors":"Marc Sánchez Artigas, Pablo Gimeno Sarroca","doi":"10.1145/3464298.3494884","DOIUrl":"https://doi.org/10.1145/3464298.3494884","url":null,"abstract":"Function-as-a-Service (FaaS) has raised a growing interest in how to \"tame\" serverless to enable domain-specific use cases such as data-intensive applications and machine learning (ML), to name a few. Recently, several systems have been implemented for training ML models. Certainly, these research articles are significant steps in the correct direction. However, they do not completely answer the nagging question of when serverless ML training can be more cost-effective compared to traditional \"serverful\" computing. To help in this task, we propose MLLess, a FaaS-based ML training prototype built atop IBM Cloud Functions. To boost cost-efficiency, MLLess implements two key optimizations: a significance filter and a scale-in auto-tuner, and leverages them to specialize model training to the FaaS model. Our results certify that MLLess can be 15X faster than serverful ML systems [24] at a lower cost for ML models (such as sparse logistic regression and matrix factorization) that exhibit fast convergence.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132516778","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dimitris Palyvos-Giannas, G. Mencagli, M. Papatriantafilou, Vincenzo Gulisano
{"title":"Lachesis","authors":"Dimitris Palyvos-Giannas, G. Mencagli, M. Papatriantafilou, Vincenzo Gulisano","doi":"10.14271/dms-12009-de","DOIUrl":"https://doi.org/10.14271/dms-12009-de","url":null,"abstract":"","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115832545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Snarl: entangled merkle trees for improved file availability and storage utilization","authors":"Racin Nygaard, Vero Estrada-Galiñanes, H. Meling","doi":"10.1145/3464298.3493397","DOIUrl":"https://doi.org/10.1145/3464298.3493397","url":null,"abstract":"In cryptographic decentralized storage systems, files are split into chunks and distributed across a network of peers. These storage systems encode files using Merkle trees, a hierarchical data structure that provides integrity verification and lookup services. A Merkle tree maps the chunks of a file to a single root whose hash value is the file's content-address. A major concern is that even minor network churn can result in chunks becoming irretrievable due to the hierarchical dependencies in the Merkle tree. For example, chunks may be available but can not be found if all peers storing the root fail. Thus, to reduce the impact of churn, a decentralized replication process typically stores each chunk at multiple peers. However, we observe that this process reduces the network's storage utilization and is vulnerable to cascading failures as some chunks are replicated 10X less than others. We propose Snarl, a novel storage component that uses a variation of alpha entanglement codes to add user-controlled redundancy to address these problems. Our contributions are summarized as follows: 1) the design of an entangled Merkle tree, a resilient data structure that reduces the impact of hierarchical dependencies, and 2) the Snarl prototype to improve file availability and storage utilization in a real-world storage network. We evaluate Snarl using various failure scenarios on a large cluster running the Ethereum Swarm network. Our evaluation shows that Snarl increases storage utilization by 5X in Swarm with improved file availability. File recovery is bandwidth-efficient and uses less than 2X chunks on average in scenarios with up to 50% of total chunk loss.","PeriodicalId":154994,"journal":{"name":"Proceedings of the 22nd International Middleware Conference","volume":"108 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122747826","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}