{"title":"Workload-driven systems optimization: things are getting real!","authors":"C. Curino","doi":"10.1145/3465480.3466918","DOIUrl":"https://doi.org/10.1145/3465480.3466918","url":null,"abstract":"In this talk, I will describe how workload-driven system optimization has quickly evolved from a researcher's dream into a production reality. In particular, I will provide a high-level overview of a large collection of efforts that build upon careful telemetry collection and analysis to tune and optimize large-scale distributed systems in production at Microsoft. While the opportunities are staggering (e.g., improving system efficiencies to the tune of 10s of millions of dollars per year), the challenges are equally daunting. I will discuss some of the ways we handle the challenges at MS and point at open problems along the way.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"109 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115396876","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards event-driven decentralized marketplaces on the blockchain","authors":"Akash Pateria, Kemafor Anyanwu","doi":"10.1145/3465480.3466921","DOIUrl":"https://doi.org/10.1145/3465480.3466921","url":null,"abstract":"Blockchains have become a popular technology for lowering the trust-tax burden between transacting parties that cannot necessarily trust each other. They are used as substitutes for the centralized authorities typically incorporated in transactional workflows to perform verification tasks and have the advantage of being objective and incorruptible. For applications such as supply chain marketplaces, auxilliary functionalities beyond the core blockchain roles of recording and validating transactions such as event detection are important for enabling application participants be responsive to business conditions. Unfortunately, existing blockchain event frameworks are immature, syntactic, inflexible and not expressive enough for many application needs. In this paper, we propose an approach that involves an event model which \"semantifies blockchain transactions\" and an implementation architecture that integrates a open-source blockchain database BigChainDB with a semantic engine and publish-subscribe messaging platform. Finally, we model and simulate a use-case inspired from the manufacturing domain and present usability and preliminary performance results that demonstrate the discriminatory ability of semantics-enabled event model.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129645033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine learning over static and dynamic relational data","authors":"A. Kara, M. Nikolic, Dan Olteanu, Haozhe Zhang","doi":"10.1145/3465480.3467843","DOIUrl":"https://doi.org/10.1145/3465480.3467843","url":null,"abstract":"This tutorial overviews principles behind recent works on training and maintaining machine learning models over relational data, with an emphasis on the exploitation of the relational data structure to improve the runtime performance of the learning task. The tutorial has the following parts: (1) Database research for data science (2) Three main ideas to achieve performance improvements (2.1) Turn the ML problem into a DB problem (2.2) Exploit structure of the data and problem (2.3) Exploit engineering tools of a DB researcher (3) Avenues for future research","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129072580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A distributed database system for event-based microservices","authors":"Rodrigo Laigner, Yongluan Zhou, M. V. Salles","doi":"10.1145/3465480.3466919","DOIUrl":"https://doi.org/10.1145/3465480.3466919","url":null,"abstract":"Microservice architectures are an emerging industrial approach to build large scale and event-based systems. In this architectural style, an application is functionally partitioned into several small and autonomous building blocks, so-called microservices, communicating and exchanging data with each other via events. By pursuing a model where fault isolation is enforced at microservice level, each microservice manages their own database, thus database systems are not shared across microservices. Developers end up encoding substantial data management logic in the application-tier and encountering a series of challenges on enforcing data integrity and maintaining data consistency across microservices. In this vision paper, we argue that there is a need to rethink how database systems can better support microservices and relieve the burden of handling complex data management tasks faced by programmers. We envision the design and research opportunities for a novel distributed database management system targeted at event-driven microservices.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124106863","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mudabbir Kaleem, Keshav Kasichainula, Rabimba Karanjai, Lei Xu, Zhimin Gao, Lin Chen, W. Shi
{"title":"An event driven framework for smart contract execution","authors":"Mudabbir Kaleem, Keshav Kasichainula, Rabimba Karanjai, Lei Xu, Zhimin Gao, Lin Chen, W. Shi","doi":"10.1145/3465480.3466924","DOIUrl":"https://doi.org/10.1145/3465480.3466924","url":null,"abstract":"Blockchain-based smart contract platforms have traditionally employed the transaction-driven execution model. This paper presents an alternate framework for blockchain-based smart contract execution called EDSC. Our platform design presents a novel approach to tackle the scalability and performance challenges facing the smart contract ecosystem. We base EDSC's design on the Ethereum template, and it can be readily implemented for other existing smart contract platforms. To evaluate our design, we perform an experimental implementation using the Ethereum client. Our experiments with performance modeling show, on average, a 2.2 to 4.6 times reduced total latency of event-triggered smart contracts, demonstrating the effectiveness of the design in supporting time-sensitive applications. Additionally, we comment on the design's potential security aspects and demonstrate its utility by discussing potential use cases.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127073004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time big data stream analytics and complex event detection: modular visual framework, data science platform, and industry applications","authors":"R. Klinkenberg","doi":"10.1145/3465480.3468676","DOIUrl":"https://doi.org/10.1145/3465480.3468676","url":null,"abstract":"In many industry applications, larger and larger amounts of data become available, allowing to gain deeper insights, to generate more accurate forecats, to optimize and automate processes, and to thereby create significant value. Often the data is not static, but arrives continuously in large data streams, which ideally are processed and leveraged in real-time. This talk will present a modular and flexible platform for real-time big data stream processing, complex event detection, data science and machine learning with an easy-to-use visual process design user interface, seamlessly integrating the most relevant big data and stream processing frameworks (Hadoop, Spark, Spark Streaming, Kafka, Flink, etc.) within one unifying platform and user interface, based on the widely used data science platform RapidMiner. This talk will also provide an overview of applications of this framework across many industries like machine failure prediction and prevention and predictive maintenance in industrial production in the manufacturing industry, criticial event detection, prediction and prevention in the chemical indutry, product quality prediction and optimization as well as energy consumption and cost reduction in the steel and metal industry, data-driven process optimization in the food and beverage industry, various use cases in the automotive and aviation industry, maritime data analysis to detect complex events like piracy or illegal fishing and to avoid collisions, drug effectiveness prediction for cancer drug development and biomedical research, financial time series analysis and forecasting for the investment industry. The latter three use cases are addresed in the European reseach projects INFORE, which will also be shortly introduced.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114898120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
M. de Heus, Kyriakos Psarakis, Marios Fragkoulis, Asterios Katsifodimos
{"title":"Distributed transactions on serverless stateful functions","authors":"M. de Heus, Kyriakos Psarakis, Marios Fragkoulis, Asterios Katsifodimos","doi":"10.1145/3465480.3466920","DOIUrl":"https://doi.org/10.1145/3465480.3466920","url":null,"abstract":"Serverless computing is currently the fastest-growing cloud services segment. The most prominent serverless offering is Function-as-a-Service (FaaS), where users write functions and the cloud automates deployment, maintenance, and scalability. Although FaaS is a good fit for executing stateless functions, it does not adequately support stateful constructs like microservices and scalable, low-latency cloud applications, mainly because it lacks proper state management support and the ability to perform function-to-function calls. Most importantly, executing transactions across stateful functions remains an open problem. In this paper, we introduce a programming model and implementation for transaction orchestration of stateful serverless functions. Our programming model supports serializable distributed transactions with two-phase commit, as well as relaxed transactional guarantees with Sagas. We design and implement our programming model on Apache Flink StateFun. We choose to build our solution on top of StateFun in order to leverage Flink's exactly-once processing and state management guarantees. We base our evaluation on the YCSB benchmark, which we extended with transactional operations and adapted for the SFaaS programming model. Our experiments show that our transactional orchestration adds 10% overhead to the original system and that Sagas can achieve up to 34% more transactions per second than two-phase commit transactions at a sub-200ms latency.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129676598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
E. Kougioumtzi, Antonios Kontaxakis, Antonios Deligiannakis, Y. Kotidis
{"title":"Towards creating a generalized complex event processing operator using FlinkCEP: architecture & benchmark","authors":"E. Kougioumtzi, Antonios Kontaxakis, Antonios Deligiannakis, Y. Kotidis","doi":"10.1145/3465480.3467841","DOIUrl":"https://doi.org/10.1145/3465480.3467841","url":null,"abstract":"FlinkCEP, the Complex Event Processing (CEP) API of the Flink Big Data platform, scales-out pattern detection to a number of machines in a computer cluster or cloud. The high expressive power of FlinkCEP's language comes at the cost of cumbersome parameterization of the patterns to be monitored, thus limiting usability. In this work, we build a novel, logical CEP operator that receives as input specifications of CEP queries in the form of extended regular expressions and automatically re-writes them to FlinkCEP programs. We also initiate a benchmarking effort on FlinkCEP.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127914593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast recovery of correlated failures in distributed stream processing engines","authors":"Li Su, Yongluan Zhou","doi":"10.1145/3465480.3466923","DOIUrl":"https://doi.org/10.1145/3465480.3466923","url":null,"abstract":"In a large-scale cluster, correlated failures usually involve a number of nodes failing simultaneously. Although correlated failures occur infrequently, they have significant effect on systems' availability, especially for streaming applications that require real-time analysis, as repairing the failed nodes or acquiring additional ones would take a significant amount of time. Most state-of-the-art distributed stream processing systems (DSPSs) focus on recovering individual failures and do not consider the optimization for recovering correlated failure. In this work, we propose an incremental and query-centric recovery paradigm where the recovery of failed operator partitions would be carefully scheduled based on the current availability of resources, such that the outputs of queries can be recovered as early as possible. By analyzing the existing recovery techniques, we identify the challenges and propose a fault-tolerance framework that can support incremental recovery with minimum overhead during the system's normal execution. We also formulate the new problem of recovery scheduling under correlated failures and design algorithms to optimize the recovery latency with a performance guarantee. A comprehensive set of experiments are conducted to study the validity of our proposal.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"101 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122900613","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Building an end-to-end BAD application","authors":"Shahrzad Haji Amin Shirazi, M. Carey, V. Tsotras","doi":"10.1145/3465480.3467840","DOIUrl":"https://doi.org/10.1145/3465480.3467840","url":null,"abstract":"Traditional big data infrastructures are passive in nature, passively answering user requests to process and return data. In many applications however, users not only need to analyze data, but also to subscribe to and actively receive data of interest, based on their subscriptions. Their interest may include the incoming data's content as well as its relationships to other data. Moreover, data delivered to subscribers may need to be enriched with additional relevant and actionable information. To address this Big Active Data (BAD) challenge we have advocated the need for building scalable BAD systems that continuously and reliably capture big data while enabling timely and automatic delivery of relevant and possibly enriched information to a large pool of subscribers. In this demo we showcase how to build an end-to-end active application using a BAD system and a standard email broker for data delivery. This includes enabling users to register their interests with the bad system, ingesting and monitoring data, and producing customized results and delivering them to the appropriate subscribers. Through this example we demonstrate that even complex active data applications can be created easily and scale to many users, considerably limiting the effort of application developers, if a BAD approach is taken.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130382373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}