Detecting Anomaly and Replacement Prediction for Rainfall Open Data in Thailand

2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2021-06-30 DOI:10.1109/JCSSE53117.2021.9493814

Intouch Prakaisak, E. Phaisangittisagul, M. Maleewong, Kanoksri Sarinnapakorn, C. Phongpensri

{"title":"Detecting Anomaly and Replacement Prediction for Rainfall Open Data in Thailand","authors":"Intouch Prakaisak, E. Phaisangittisagul, M. Maleewong, Kanoksri Sarinnapakorn, C. Phongpensri","doi":"10.1109/JCSSE53117.2021.9493814","DOIUrl":null,"url":null,"abstract":"The rainfall data set usually contains missing values due to easily broken sensors. In Thailand, many public agencies collect rainfall values, including National Hydro Informatics (HII), Thai Meteorological Department, etc., since the data are valuable in terms of rainfall prediction, which is important for an agricultural country like Thailand. The rainfall data is normally collected hourly, and because there are many sensor locations, it is hard to maintain these sensors. The sensor data can be lost transiently and/or may yield anomaly values. Since there is a lot of data flowing to the server every day, it is hard to inspect manually or even semi-manually. This project collaborates with HII to develop a system that automates the rainfall data quality improvement process. The machine learning algorithms are used as tools for data cleansing. The derived data can be exposed as an open data set for many developers to explore new innovations. We explore data set characteristics and adopt both statistical and machine learning methods. The results show that the approach used both statistical and machine learning resulting in higher accuracy than using only statistical or machine learning approaches. We also develop a web application to visualize rainfall data results after cleansing and be connected to the models for the automatic cleansing pipelines.","PeriodicalId":437534,"journal":{"name":"2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/JCSSE53117.2021.9493814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

The rainfall data set usually contains missing values due to easily broken sensors. In Thailand, many public agencies collect rainfall values, including National Hydro Informatics (HII), Thai Meteorological Department, etc., since the data are valuable in terms of rainfall prediction, which is important for an agricultural country like Thailand. The rainfall data is normally collected hourly, and because there are many sensor locations, it is hard to maintain these sensors. The sensor data can be lost transiently and/or may yield anomaly values. Since there is a lot of data flowing to the server every day, it is hard to inspect manually or even semi-manually. This project collaborates with HII to develop a system that automates the rainfall data quality improvement process. The machine learning algorithms are used as tools for data cleansing. The derived data can be exposed as an open data set for many developers to explore new innovations. We explore data set characteristics and adopt both statistical and machine learning methods. The results show that the approach used both statistical and machine learning resulting in higher accuracy than using only statistical or machine learning approaches. We also develop a web application to visualize rainfall data results after cleansing and be connected to the models for the automatic cleansing pipelines.

查看原文本刊更多论文

泰国降雨开放数据异常检测与替代预测

由于传感器容易损坏，降雨数据集通常包含缺失值。在泰国，许多公共机构收集降雨值，包括National Hydro Informatics (HII)，泰国气象部门等，因为这些数据在降雨预测方面很有价值，这对于泰国这样的农业国家很重要。降雨数据通常每小时收集一次，由于有许多传感器位置，很难维护这些传感器。传感器数据可能瞬间丢失和/或可能产生异常值。由于每天都有大量数据流向服务器，因此很难手动甚至半手动地进行检查。该项目与HII合作开发一个系统，使降雨数据质量改进过程自动化。机器学习算法被用作数据清理的工具。派生数据可以作为开放数据集公开，供许多开发人员探索新的创新。我们探索数据集的特征，并采用统计和机器学习方法。结果表明，该方法同时使用统计和机器学习，比仅使用统计或机器学习方法产生更高的准确性。我们还开发了一个web应用程序，将清洗后的降雨数据结果可视化，并连接到自动清洗管道的模型。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 18th International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量