Machine learning empowered prediction of geolocation using groundwater quality variables over YSR district of India

Turkish Journal of Engineering and Environmental Sciences Pub Date : 2023-04-12 DOI:10.31127/tuje.1223779

Jagadish Kumar MOGARAJU

{"title":"Machine learning empowered prediction of geolocation using groundwater quality variables over YSR district of India","authors":"Jagadish Kumar MOGARAJU","doi":"10.31127/tuje.1223779","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) has been used in the prediction of geolocation with improved accuracies in this work. The pre-processed data was subjected to prediction analytics using 22 machine learning algorithms over regression mode. It was observed that Extra Trees Regressor performed well with better accuracies in predicting latitude, longitude, and Haversine distance, respectively. Regression models like CatBoost, Extreme Gradient boosting, Light Gradient boosting machine, and Gradient boosting regressor were also tested. The R2 values were computed for each case, and we obtained 0.96 (Longitude), 0.98 (Latitude), and 0.96 (Haversine), respectively. The evaluation of models was done using metrics like MAE, MASE, RMSE, R2, RMSLE, and MAPE and R2 is considered most important than others. The effect of data point was calculated using Cooks’ distance, and the variable fluoride has a significant impact on the prediction accuracy of Longitude followed by RSC, Cl, SO4, SAR, NO3, NA, Ca, EC and pH variables. In the prediction of latitude, the SAR variable played a significant role, followed by Na and TH. According to the t-SNE manifold, three longitude values were quite different from the others. This work is supported by some of the manifests like Cooks’ distance outlier detection, feature importance plot, t-SNE manifold, prediction error plot, residuals plot, RFECV plot, and validation curve. This work is done to report that the challenge of predicting both latitude and longitude on a common ground is solved partially, if not completely, and machine learning tools can be used for this purpose. Haversine distance can be obtained from latitude and longitude and can be used in the prediction of geolocation.","PeriodicalId":23377,"journal":{"name":"Turkish Journal of Engineering and Environmental Sciences","volume":"102 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish Journal of Engineering and Environmental Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31127/tuje.1223779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Machine Learning (ML) has been used in the prediction of geolocation with improved accuracies in this work. The pre-processed data was subjected to prediction analytics using 22 machine learning algorithms over regression mode. It was observed that Extra Trees Regressor performed well with better accuracies in predicting latitude, longitude, and Haversine distance, respectively. Regression models like CatBoost, Extreme Gradient boosting, Light Gradient boosting machine, and Gradient boosting regressor were also tested. The R2 values were computed for each case, and we obtained 0.96 (Longitude), 0.98 (Latitude), and 0.96 (Haversine), respectively. The evaluation of models was done using metrics like MAE, MASE, RMSE, R2, RMSLE, and MAPE and R2 is considered most important than others. The effect of data point was calculated using Cooks’ distance, and the variable fluoride has a significant impact on the prediction accuracy of Longitude followed by RSC, Cl, SO4, SAR, NO3, NA, Ca, EC and pH variables. In the prediction of latitude, the SAR variable played a significant role, followed by Na and TH. According to the t-SNE manifold, three longitude values were quite different from the others. This work is supported by some of the manifests like Cooks’ distance outlier detection, feature importance plot, t-SNE manifold, prediction error plot, residuals plot, RFECV plot, and validation curve. This work is done to report that the challenge of predicting both latitude and longitude on a common ground is solved partially, if not completely, and machine learning tools can be used for this purpose. Haversine distance can be obtained from latitude and longitude and can be used in the prediction of geolocation.

查看原文本刊更多论文

机器学习增强了利用印度YSR地区地下水质量变量预测地理位置的能力

在这项工作中，机器学习(ML)已被用于预测地理位置，并提高了准确性。预处理后的数据在回归模式上使用22种机器学习算法进行预测分析。观察到Extra Trees Regressor分别在预测纬度、经度和哈弗辛距离方面表现良好，精度较高。对CatBoost、Extreme Gradient boosting、Light Gradient boosting machine、Gradient boosting regressor等回归模型进行了测试。计算每个病例的R2值，我们分别得到0.96(经度)、0.98(纬度)和0.96(哈弗辛)。模型的评估使用指标如MAE、MASE、RMSE、R2、RMSLE和MAPE，其中R2被认为是最重要的。利用库氏距离计算数据点的影响，氟变量对经度的预测精度影响显著，其次是RSC、Cl、SO4、SAR、NO3、NA、Ca、EC和pH变量。在纬度预测中，SAR变量的作用最显著，其次是Na和TH。根据t-SNE流形，三个经度值与其他经度值相差很大。本文的工作得到了库克斯距离异常点检测、特征重要性图、t-SNE流形、预测误差图、残差图、RFECV图和验证曲线等表的支持。这项工作的完成是为了报告在一个共同的基础上预测纬度和经度的挑战是部分解决的，如果不是完全解决的话，机器学习工具可以用于此目的。哈弗斯距离可由经纬度得到，可用于地理位置的预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Turkish Journal of Engineering and Environmental Sciences

自引率

0.00%

发文量