{"title":"Scalable analytics of air quality batches with Apache Spark and Apache Sedona","authors":"Rim Moussa","doi":"10.1145/3465480.3466931","DOIUrl":"https://doi.org/10.1145/3465480.3466931","url":null,"abstract":"According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130685849","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DEBS grand challenge: real-time detection of air quality improvement with Apache Flink","authors":"J. Marić, K. Pripužić, Martina Antonić","doi":"10.1145/3465480.3466930","DOIUrl":"https://doi.org/10.1145/3465480.3466930","url":null,"abstract":"The topic of the DEBS Grand Challenge 2021 is to develop a solution for detecting areas in which the air quality index (AQI) improved the most when compared to the previous year. The solution must run two given continuous queries in parallel on the incoming sensor data stream which must return the following: 1) a top 50 cities in terms of AQI improvement with their current AQIs and 2) a histogram of the longest streaks of good AQI. The incoming data is accessed through an API which provides streaming sensor measurements in batches. We present our solution based on Apache Flink, a distributed stream processing framework for the cluster. We opted for Flink since its applications can easily be scaled horizontally and vertically by adding computation nodes or increasing available resources, respectively. Flink allows us to divide the given queries into smaller tasks which can be run concurrently on different nodes in order to reduce the overall processing time and thus improve the performance of our solution. In more detail, the following performance intensive tasks are run in parallel on distributed nodes: 1) retrieving measurement batches, 2) assigning a city to each measurement and 3) calculating air quality index per city. We also discuss the main optimizations we have used to improve the performance and present an experimental evaluation of our solution.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117083344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Thinking in events: from databases to distributed collaboration software","authors":"Martin Kleppmann","doi":"10.1145/3465480.3467835","DOIUrl":"https://doi.org/10.1145/3465480.3467835","url":null,"abstract":"In this keynote I give a subjective but systematic overview of the landscape of distributed event-based systems, with an emphasis on two areas I have worked on over the last decade: large-scale stream processing with Apache Kafka and associated tools, and real-time collaboration software in the style of Google Docs. While these may seem at first glance to be very different topics, there are also important points of overlap. This paper lays out a taxonomy of event-based systems that shows where their commonalities and differences lie. It also highlights some of the key trade-offs that arise in the implementation of event-based systems, drawing both from distributed systems theory and from experience of their practical deployment. Finally, the paper outlines a number of open research problems in this field.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116064131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Benczúr, Ferenc Béres, Domokos M. Kelen, Róbert Pálovics
{"title":"Tutorial on graph stream analytics","authors":"A. Benczúr, Ferenc Béres, Domokos M. Kelen, Róbert Pálovics","doi":"10.1145/3465480.3468293","DOIUrl":"https://doi.org/10.1145/3465480.3468293","url":null,"abstract":"In this short tutorial, we cover recent methods to analyze and model network data accessible as a stream of edges, such as interactions in a social network service, or any other graph database with real-time updates from a stream. First we introduce the data streaming computational model and give examples of the so-called temporal networks. We describe how traditional graph properties (sampling, subgraph counting, graph query evaluation, etc.), low-rank approximation, network embedding, link prediction, and centrality algorithms can be implemented and updated while the edge stream is processed. As an outlook, we discuss among others distributed data stream processing engines and concept drift detection in streams. For most part, we provide sample data and implementation as Python codes packaged in a Docker image.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"36 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130443228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards autonomous semantic stream fusion for distributed video streams","authors":"M. Duc, Anh Le-Tuan, M. Hauswirth, Danh Le-Phuoc","doi":"10.1145/3465480.3467837","DOIUrl":"https://doi.org/10.1145/3465480.3467837","url":null,"abstract":"Video streams are becoming ubiquitous in smart cities and traffic monitoring. Recent advances in computer vision with deep neural networks enable querying a rich set of visual features from these video streams. However, it is challenging to deploy these queries on edge devices due to the resource intensive nature of the computing operations of this sort. Hence, this paper will demonstrate our approach in pushing these computing operations closer to the video stream sources via autonomous stream fusion agents. These agents will facilitate an edge computing paradigm that enables edge devices to utilize its computing resources to serve federated queries over video streams. Our demonstration shows that edge devices can significantly alleviate the bottleneck of the centralized server in dealing with distributed video streams.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116949060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Solving the 2021 DEBS grand challenge using Apache Flink","authors":"Mina N. F. Morcos, Baiqing Lyu, S. Kalathur","doi":"10.1145/3465480.3466929","DOIUrl":"https://doi.org/10.1145/3465480.3466929","url":null,"abstract":"The DEBS Grand Challenge is an annual event in which different event-based systems compete to solve a real-world problem. For the year 2021, the challenge is computing information given air quality sensor data. Due to the pandemic many factories are forced to close down, and the aim is to find out the cities that have improved the most in air quality and cities that have achieved the longest sequence of good AQI values. This paper aims to solve the above challenges using Apache Flink, an open-source, unified stream-processing and batch-processing framework developed by the Apache Software Foundation.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125785193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The synergy of complex event processing and tiny machine learning in industrial IoT","authors":"Haoyu Ren, Darko Anicic, T. Runkler","doi":"10.1145/3465480.3466928","DOIUrl":"https://doi.org/10.1145/3465480.3466928","url":null,"abstract":"Focusing on comprehensive networking, the Industrial Internet-of-Things (IIoT) facilitates efficiency and robustness in factory operations. Various intelligent sensors play a central role, as they generate a vast amount of real-time data that can provide insights into manufacturing. Complex event processing (CEP) and machine learning (ML) have been developed actively in the last years in IIoT to identify patterns in heterogeneous data streams and fuse raw data into tangible facts. In a traditional compute-centric paradigm, the raw field data are continuously sent to the cloud and processed centrally. As IIoT devices become increasingly pervasive, concerns are raised since transmitting such an amount of data is energy-intensive, vulnerable to be intercepted, and subjected to high latency. Decentralized on-device ML and CEP provide a solution where data is processed primarily on edge devices. Thus communications can be minimized. However, this is no mean feat because most IIoT edge devices are resource-constrained with low power consumption. This paper proposes a framework that exploits ML and CEP's synergy at the edge in distributed sensor networks. By leveraging tiny ML and μCEP, we now shift the computation from the cloud to the resource-constrained IIoT devices and allow users to adapt on-device ML models and CEP reasoning rules flexibly on the fly. Lastly, we demonstrate the proposed solution and show its effectiveness and feasibility using an industrial use case of machine safety monitoring.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122720303","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}