KNN imputation to missing values of regression-based rain duration prediction on BMKG data

Ikke Dian Oktaviani, Aji Gautama Putrada
{"title":"KNN imputation to missing values of regression-based rain duration prediction on BMKG data","authors":"Ikke Dian Oktaviani, Aji Gautama Putrada","doi":"10.20895/infotel.v14i4.840","DOIUrl":null,"url":null,"abstract":"The prediction of rain duration based on data from the Meteorology, Climatology, and Geophysics Agency (BMKG) is an important issue but remains an open problem. At the same time, several studies have shown that missing values can cause a decrease in the performance of the model in making predictions. This study proposes k-nearest neighbors (KNN) imputation to overcome the problem of missing values in predicting rain duration. The source of the rain duration prediction dataset is the BMKG data. We compared gradient boosting regression (GBR), adaptive boosting regression (ABR), and linear regression (LR) for the regression model for predicting rain duration. We compared the KNN imputation method with several benchmark methods, including zero imputation, mean imputation, and iterative imputation. Parameters r2, mean squared error (MSE) and mean bias error (MBE) measure the performance of these imputation methods. The test results show that for rain duration prediction using the regression method, GBR shows the best performance, both for train data and test data with r2 = 0.915 and 0.776, respectively. Then our proposed KNN imputation has the best performance for missing value imputation compared to the benchmark imputation method. The prediction values of r2 and MSE when using KNN imputation at Missing Percentage = 90% are 0.71 and 0.36, respectively.","PeriodicalId":30672,"journal":{"name":"Jurnal Infotel","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Infotel","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20895/infotel.v14i4.840","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The prediction of rain duration based on data from the Meteorology, Climatology, and Geophysics Agency (BMKG) is an important issue but remains an open problem. At the same time, several studies have shown that missing values can cause a decrease in the performance of the model in making predictions. This study proposes k-nearest neighbors (KNN) imputation to overcome the problem of missing values in predicting rain duration. The source of the rain duration prediction dataset is the BMKG data. We compared gradient boosting regression (GBR), adaptive boosting regression (ABR), and linear regression (LR) for the regression model for predicting rain duration. We compared the KNN imputation method with several benchmark methods, including zero imputation, mean imputation, and iterative imputation. Parameters r2, mean squared error (MSE) and mean bias error (MBE) measure the performance of these imputation methods. The test results show that for rain duration prediction using the regression method, GBR shows the best performance, both for train data and test data with r2 = 0.915 and 0.776, respectively. Then our proposed KNN imputation has the best performance for missing value imputation compared to the benchmark imputation method. The prediction values of r2 and MSE when using KNN imputation at Missing Percentage = 90% are 0.71 and 0.36, respectively.
基于回归的BMKG降雨持续时间预测缺失值的KNN插补
根据气象、气候和地球物理局(BMKG)的数据预测降雨持续时间是一个重要问题,但仍然是一个悬而未决的问题。与此同时,几项研究表明,缺失的值可能会导致模型预测性能下降。本研究提出了k近邻(KNN)插补,以克服降雨持续时间预测中的缺失值问题。降雨持续时间预测数据集的来源是BMKG数据。我们比较了预测降雨持续时间的回归模型的梯度增强回归(GBR)、自适应增强回归(ABR)和线性回归(LR)。我们将KNN插补方法与几种基准方法进行了比较,包括零插补、平均插补和迭代插补。参数r2、均方误差(MSE)和均偏误差(MBE)衡量这些插补方法的性能。测试结果表明,对于使用回归方法的降雨持续时间预测,GBR在列车数据和测试数据中表现出最佳性能,r2=0.915和0.776。然后,与基准插补方法相比,我们提出的KNN插补在缺失值插补方面具有最佳性能。当在缺失百分比=90%时使用KNN插补时,r2和MSE的预测值分别为0.71和0.36。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
47
审稿时长
6 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信