{"title":"Machine learning empowered prediction of geolocation using groundwater quality variables over YSR district of India","authors":"Jagadish Kumar MOGARAJU","doi":"10.31127/tuje.1223779","DOIUrl":null,"url":null,"abstract":"Machine Learning (ML) has been used in the prediction of geolocation with improved accuracies in this work. The pre-processed data was subjected to prediction analytics using 22 machine learning algorithms over regression mode. It was observed that Extra Trees Regressor performed well with better accuracies in predicting latitude, longitude, and Haversine distance, respectively. Regression models like CatBoost, Extreme Gradient boosting, Light Gradient boosting machine, and Gradient boosting regressor were also tested. The R2 values were computed for each case, and we obtained 0.96 (Longitude), 0.98 (Latitude), and 0.96 (Haversine), respectively. The evaluation of models was done using metrics like MAE, MASE, RMSE, R2, RMSLE, and MAPE and R2 is considered most important than others. The effect of data point was calculated using Cooks’ distance, and the variable fluoride has a significant impact on the prediction accuracy of Longitude followed by RSC, Cl, SO4, SAR, NO3, NA, Ca, EC and pH variables. In the prediction of latitude, the SAR variable played a significant role, followed by Na and TH. According to the t-SNE manifold, three longitude values were quite different from the others. This work is supported by some of the manifests like Cooks’ distance outlier detection, feature importance plot, t-SNE manifold, prediction error plot, residuals plot, RFECV plot, and validation curve. This work is done to report that the challenge of predicting both latitude and longitude on a common ground is solved partially, if not completely, and machine learning tools can be used for this purpose. Haversine distance can be obtained from latitude and longitude and can be used in the prediction of geolocation.","PeriodicalId":23377,"journal":{"name":"Turkish Journal of Engineering and Environmental Sciences","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Turkish Journal of Engineering and Environmental Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31127/tuje.1223779","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Machine Learning (ML) has been used in the prediction of geolocation with improved accuracies in this work. The pre-processed data was subjected to prediction analytics using 22 machine learning algorithms over regression mode. It was observed that Extra Trees Regressor performed well with better accuracies in predicting latitude, longitude, and Haversine distance, respectively. Regression models like CatBoost, Extreme Gradient boosting, Light Gradient boosting machine, and Gradient boosting regressor were also tested. The R2 values were computed for each case, and we obtained 0.96 (Longitude), 0.98 (Latitude), and 0.96 (Haversine), respectively. The evaluation of models was done using metrics like MAE, MASE, RMSE, R2, RMSLE, and MAPE and R2 is considered most important than others. The effect of data point was calculated using Cooks’ distance, and the variable fluoride has a significant impact on the prediction accuracy of Longitude followed by RSC, Cl, SO4, SAR, NO3, NA, Ca, EC and pH variables. In the prediction of latitude, the SAR variable played a significant role, followed by Na and TH. According to the t-SNE manifold, three longitude values were quite different from the others. This work is supported by some of the manifests like Cooks’ distance outlier detection, feature importance plot, t-SNE manifold, prediction error plot, residuals plot, RFECV plot, and validation curve. This work is done to report that the challenge of predicting both latitude and longitude on a common ground is solved partially, if not completely, and machine learning tools can be used for this purpose. Haversine distance can be obtained from latitude and longitude and can be used in the prediction of geolocation.