提高海藻污染大数据预测精度的稳健m估计器和机器学习算法

Journal of the Nigerian Society of Physical Sciences Pub Date : 2023-02-04 DOI:10.46481/jnsps.2023.1137

O. Ibidoja, Fam Pei Shan, Mukhtar, J. Sulaiman, Majid Khan Majahar Ali

{"title":"提高海藻污染大数据预测精度的稳健m估计器和机器学习算法","authors":"O. Ibidoja, Fam Pei Shan, Mukhtar, J. Sulaiman, Majid Khan Majahar Ali","doi":"10.46481/jnsps.2023.1137","DOIUrl":null,"url":null,"abstract":"A common problem in regression analysis using ordinary least squares (OLS) is the effect of outliers or contaminated data on the estimates of the parameters. A robust method that is not sensitive to outliers and can handle contaminated data is needed. In this study, the objective is to determine the significant parameters that determine the moisture content of the seaweed after drying and develop a hybrid model to reduce the outliers. The data were collected with sensors from the v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. After the second order interaction, we have 435 drying parameters, each parameter has 1914 observations. First, we used four machine learning algorithms, such as random forest, support vector machine, bagging and boosting to determine the significant parameters by selecting 15, 25, 35 and 45 parameters. Second, we developed the hybrid model using robust methods such as M. Bi-Square, M. Hampel and M. Huber. The results show that there is a significant improvement in the reduction of the number of outliers and better prediction using hybrid model for the contaminated seaweed big data. For the highest variable importance of 45 significant drying parameters of seaweed, the hybrid model bagging M Bi-square performs better because it has the lowest percentage of outliers of 4.08 %.","PeriodicalId":342917,"journal":{"name":"Journal of the Nigerian Society of Physical Sciences","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Robust M-estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data\",\"authors\":\"O. Ibidoja, Fam Pei Shan, Mukhtar, J. Sulaiman, Majid Khan Majahar Ali\",\"doi\":\"10.46481/jnsps.2023.1137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common problem in regression analysis using ordinary least squares (OLS) is the effect of outliers or contaminated data on the estimates of the parameters. A robust method that is not sensitive to outliers and can handle contaminated data is needed. In this study, the objective is to determine the significant parameters that determine the moisture content of the seaweed after drying and develop a hybrid model to reduce the outliers. The data were collected with sensors from the v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. After the second order interaction, we have 435 drying parameters, each parameter has 1914 observations. First, we used four machine learning algorithms, such as random forest, support vector machine, bagging and boosting to determine the significant parameters by selecting 15, 25, 35 and 45 parameters. Second, we developed the hybrid model using robust methods such as M. Bi-Square, M. Hampel and M. Huber. The results show that there is a significant improvement in the reduction of the number of outliers and better prediction using hybrid model for the contaminated seaweed big data. For the highest variable importance of 45 significant drying parameters of seaweed, the hybrid model bagging M Bi-square performs better because it has the lowest percentage of outliers of 4.08 %.\",\"PeriodicalId\":342917,\"journal\":{\"name\":\"Journal of the Nigerian Society of Physical Sciences\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Nigerian Society of Physical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46481/jnsps.2023.1137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Nigerian Society of Physical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46481/jnsps.2023.1137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在使用普通最小二乘(OLS)进行回归分析时，一个常见的问题是异常值或污染数据对参数估计的影响。需要一种对异常值不敏感并能处理受污染数据的鲁棒方法。在本研究中，目的是确定决定干燥后海藻水分含量的重要参数，并建立一个混合模型来减少异常值。数据是通过位于马来西亚沙巴东南海岸Semporna的v-Groove混合太阳能干燥器(v-GHSD)的传感器收集的。在二阶相互作用后，我们有435个干燥参数，每个参数有1914个观测值。首先，我们使用随机森林、支持向量机、bagging和boosting四种机器学习算法，通过选择15、25、35和45个参数来确定显著参数。其次，我们利用M. Bi-Square、M. Hampel和M. Huber等稳健方法开发了混合模型。结果表明，混合模型对污染海藻大数据在减少异常值数量和更好的预测方面有显著改善。对于海藻45个显著干燥参数中变量重要性最高的品种，套袋M双方杂交模型的异常值百分比最低，为4.08%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robust M-estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data

A common problem in regression analysis using ordinary least squares (OLS) is the effect of outliers or contaminated data on the estimates of the parameters. A robust method that is not sensitive to outliers and can handle contaminated data is needed. In this study, the objective is to determine the significant parameters that determine the moisture content of the seaweed after drying and develop a hybrid model to reduce the outliers. The data were collected with sensors from the v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. After the second order interaction, we have 435 drying parameters, each parameter has 1914 observations. First, we used four machine learning algorithms, such as random forest, support vector machine, bagging and boosting to determine the significant parameters by selecting 15, 25, 35 and 45 parameters. Second, we developed the hybrid model using robust methods such as M. Bi-Square, M. Hampel and M. Huber. The results show that there is a significant improvement in the reduction of the number of outliers and better prediction using hybrid model for the contaminated seaweed big data. For the highest variable importance of 45 significant drying parameters of seaweed, the hybrid model bagging M Bi-square performs better because it has the lowest percentage of outliers of 4.08 %.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the Nigerian Society of Physical Sciences

自引率

0.00%

发文量