DEBS grand challenge: real-time detection of air quality improvement with Apache Flink

Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems Pub Date : 2021-06-28 DOI:10.1145/3465480.3466930

J. Marić, K. Pripužić, Martina Antonić

{"title":"DEBS grand challenge: real-time detection of air quality improvement with Apache Flink","authors":"J. Marić, K. Pripužić, Martina Antonić","doi":"10.1145/3465480.3466930","DOIUrl":null,"url":null,"abstract":"The topic of the DEBS Grand Challenge 2021 is to develop a solution for detecting areas in which the air quality index (AQI) improved the most when compared to the previous year. The solution must run two given continuous queries in parallel on the incoming sensor data stream which must return the following: 1) a top 50 cities in terms of AQI improvement with their current AQIs and 2) a histogram of the longest streaks of good AQI. The incoming data is accessed through an API which provides streaming sensor measurements in batches. We present our solution based on Apache Flink, a distributed stream processing framework for the cluster. We opted for Flink since its applications can easily be scaled horizontally and vertically by adding computation nodes or increasing available resources, respectively. Flink allows us to divide the given queries into smaller tasks which can be run concurrently on different nodes in order to reduce the overall processing time and thus improve the performance of our solution. In more detail, the following performance intensive tasks are run in parallel on distributed nodes: 1) retrieving measurement batches, 2) assigning a city to each measurement and 3) calculating air quality index per city. We also discuss the main optimizations we have used to improve the performance and present an experimental evaluation of our solution.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3465480.3466930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The topic of the DEBS Grand Challenge 2021 is to develop a solution for detecting areas in which the air quality index (AQI) improved the most when compared to the previous year. The solution must run two given continuous queries in parallel on the incoming sensor data stream which must return the following: 1) a top 50 cities in terms of AQI improvement with their current AQIs and 2) a histogram of the longest streaks of good AQI. The incoming data is accessed through an API which provides streaming sensor measurements in batches. We present our solution based on Apache Flink, a distributed stream processing framework for the cluster. We opted for Flink since its applications can easily be scaled horizontally and vertically by adding computation nodes or increasing available resources, respectively. Flink allows us to divide the given queries into smaller tasks which can be run concurrently on different nodes in order to reduce the overall processing time and thus improve the performance of our solution. In more detail, the following performance intensive tasks are run in parallel on distributed nodes: 1) retrieving measurement batches, 2) assigning a city to each measurement and 3) calculating air quality index per city. We also discuss the main optimizations we have used to improve the performance and present an experimental evaluation of our solution.

查看原文本刊更多论文

DEBS的重大挑战:用Apache Flink实时检测空气质量的改善

DEBS“2021年大挑战”的主题是开发一种解决方案，用于检测与前一年相比空气质量指数(AQI)改善最大的地区。该解决方案必须在传入的传感器数据流上并行运行两个给定的连续查询，这些查询必须返回以下内容:1)空气质量改善排名前50位的城市及其当前空气质量;2)空气质量良好的最长条纹直方图。传入数据通过API访问，该API提供批量流传感器测量。我们提出了基于Apache Flink的解决方案，这是一个用于集群的分布式流处理框架。我们之所以选择Flink，是因为它的应用程序可以通过添加计算节点或增加可用资源来轻松地横向和纵向扩展。Flink允许我们将给定的查询划分为更小的任务，这些任务可以在不同的节点上并发运行，以减少总体处理时间，从而提高解决方案的性能。更详细地说，以下性能密集型任务在分布式节点上并行运行:1)检索测量批次，2)为每个测量分配一个城市，3)计算每个城市的空气质量指数。我们还讨论了用于提高性能的主要优化，并对我们的解决方案进行了实验评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems

自引率

0.00%

发文量