{"title":"使用Apache Spark和Apache Sedona对空气质量批次进行可扩展分析","authors":"Rim Moussa","doi":"10.1145/3465480.3466931","DOIUrl":null,"url":null,"abstract":"According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.","PeriodicalId":217173,"journal":{"name":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Scalable analytics of air quality batches with Apache Spark and Apache Sedona\",\"authors\":\"Rim Moussa\",\"doi\":\"10.1145/3465480.3466931\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.\",\"PeriodicalId\":217173,\"journal\":{\"name\":\"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems\",\"volume\":\"66 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3465480.3466931\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 15th ACM International Conference on Distributed and Event-based Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3465480.3466931","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Scalable analytics of air quality batches with Apache Spark and Apache Sedona
According to the American National Institute of Environmental Health Sciences (NIEHS), air pollutants are harmful to the health of humans and other living beings, and cause damage to the climate and to the ecosystem by polluting lakes, streams, and soils. Recent developments in sensor technology, and Internet of Things (IoT) technologies provide an opportunity to use sensor networks to measure air quality, in real time, at a large number of locations. The adoption and deployment of IoT technologies for sensing air quality raises a challenging research agenda related to big data processing, such as, data analysis, scalable architectures, and algorithms for best managing and processing IoT data at different edges in the IoT ecosystem. In response to the DEBS'2021 contest, we design and implement a scalable solution for comparing previous year and current year air quality indexes for German Cities, as well as the calculus of cities' longest streaks of good air quality. Our solution is designed to be scalable. It's based on primo Apache Spark - an open-source unified analytics engine for large-scale data processing, and secundo Apache Sedona for creating spatial indexes, and performing spatial operations over large-scale spatial data.