DEBS grand challenge: real-time detection of air quality improvement with Apache Flink

J. Marić, K. Pripužić, Martina Antonić
{"title":"DEBS grand challenge: real-time detection of air quality improvement with Apache Flink","authors":"J. Marić, K. Pripužić, Martina Antonić","doi":"10.1145/3465480.3466930","DOIUrl":null,"url":null,"abstract":"The topic of the DEBS Grand Challenge 2021 is to develop a solution for detecting areas in which the air quality index (AQI) improved the most when compared to the previous year. The solution must run two given continuous queries in parallel on the incoming sensor data stream which must return the following: 1) a top 50 cities in terms of AQI improvement with their current AQIs and 2) a histogram of the longest streaks of good AQI. The incoming data is accessed through an API which provides streaming sensor measurements in batches. We present our solution based on Apache Flink, a distributed stream processing framework for the cluster. We opted for Flink since its applications can easily be scaled horizontally and vertically by adding computation nodes or increasing available resources, respectively. Flink allows us to divide the given queries into smaller tasks which can be run concurrently on different nodes in order to reduce the overall processing time and thus improve the performance of our solution. In more detail, the following performance intensive tasks are run in parallel on distributed nodes: 1) retrieving measurement batches, 2) assigning a city to each measurement and 3) calculating air quality index per city. We also discuss the main optimizations we have used to improve the performance and present an experimental evaluation of our solution.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3465480.3466930","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The topic of the DEBS Grand Challenge 2021 is to develop a solution for detecting areas in which the air quality index (AQI) improved the most when compared to the previous year. The solution must run two given continuous queries in parallel on the incoming sensor data stream which must return the following: 1) a top 50 cities in terms of AQI improvement with their current AQIs and 2) a histogram of the longest streaks of good AQI. The incoming data is accessed through an API which provides streaming sensor measurements in batches. We present our solution based on Apache Flink, a distributed stream processing framework for the cluster. We opted for Flink since its applications can easily be scaled horizontally and vertically by adding computation nodes or increasing available resources, respectively. Flink allows us to divide the given queries into smaller tasks which can be run concurrently on different nodes in order to reduce the overall processing time and thus improve the performance of our solution. In more detail, the following performance intensive tasks are run in parallel on distributed nodes: 1) retrieving measurement batches, 2) assigning a city to each measurement and 3) calculating air quality index per city. We also discuss the main optimizations we have used to improve the performance and present an experimental evaluation of our solution.
DEBS的重大挑战:用Apache Flink实时检测空气质量的改善
DEBS“2021年大挑战”的主题是开发一种解决方案,用于检测与前一年相比空气质量指数(AQI)改善最大的地区。该解决方案必须在传入的传感器数据流上并行运行两个给定的连续查询,这些查询必须返回以下内容:1)空气质量改善排名前50位的城市及其当前空气质量;2)空气质量良好的最长条纹直方图。传入数据通过API访问,该API提供批量流传感器测量。我们提出了基于Apache Flink的解决方案,这是一个用于集群的分布式流处理框架。我们之所以选择Flink,是因为它的应用程序可以通过添加计算节点或增加可用资源来轻松地横向和纵向扩展。Flink允许我们将给定的查询划分为更小的任务,这些任务可以在不同的节点上并发运行,以减少总体处理时间,从而提高解决方案的性能。更详细地说,以下性能密集型任务在分布式节点上并行运行:1)检索测量批次,2)为每个测量分配一个城市,3)计算每个城市的空气质量指数。我们还讨论了用于提高性能的主要优化,并对我们的解决方案进行了实验评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信