使用Apache Spark和Apache Sedona对空气质量批次进行可扩展分析

Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems Pub Date : 2021-06-28 DOI:10.1145/3465480.3466931

Rim Moussa

{"title":"使用Apache Spark和Apache Sedona对空气质量批次进行可扩展分析","authors":"Rim Moussa","doi":"10.1145/3465480.3466931","DOIUrl":null,"url":null,"abstract":"According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalable analytics of air quality batches with Apache Spark and Apache Sedona\",\"authors\":\"Rim Moussa\",\"doi\":\"10.1145/3465480.3466931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.\",\"PeriodicalId\":217173,\"journal\":{\"name\":\"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3465480.3466931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3465480.3466931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

据美国国家环境健康科学研究所(NIEHS)称，空气污染物对人类和其他生物的健康有害，并通过污染湖泊、溪流和土壤对气候和生态系统造成破坏。传感器技术和物联网(IoT)技术的最新发展为使用传感器网络在大量地点实时测量空气质量提供了机会。采用和部署物联网技术来感知空气质量，提出了与大数据处理相关的具有挑战性的研究议程，例如数据分析、可扩展架构以及在物联网生态系统中不同边缘对物联网数据进行最佳管理和处理的算法。为了响应DEBS 2021年的竞赛，我们设计并实施了一个可扩展的解决方案，用于比较德国城市去年和今年的空气质量指数，以及城市良好空气质量的最长连续曲线的计算。我们的解决方案被设计为可扩展的。它基于一流的Apache Spark(用于大规模数据处理的开源统一分析引擎)和一流的Apache Sedona(用于创建空间索引和在大规模空间数据上执行空间操作)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Scalable analytics of air quality batches with Apache Spark and Apache Sedona

According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems

自引率

0.00%

发文量