O. Ibidoja, Fam Pei Shan, Mukhtar, J. Sulaiman, Majid Khan Majahar Ali
{"title":"提高海藻污染大数据预测精度的稳健m估计器和机器学习算法","authors":"O. Ibidoja, Fam Pei Shan, Mukhtar, J. Sulaiman, Majid Khan Majahar Ali","doi":"10.46481/jnsps.2023.1137","DOIUrl":null,"url":null,"abstract":"A common problem in regression analysis using ordinary least squares (OLS) is the effect of outliers or contaminated data on the estimates of the parameters. A robust method that is not sensitive to outliers and can handle contaminated data is needed. In this study, the objective is to determine the significant parameters that determine the moisture content of the seaweed after drying and develop a hybrid model to reduce the outliers. The data were collected with sensors from the v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. After the second order interaction, we have 435 drying parameters, each parameter has 1914 observations. First, we used four machine learning algorithms, such as random forest, support vector machine, bagging and boosting to determine the significant parameters by selecting 15, 25, 35 and 45 parameters. Second, we developed the hybrid model using robust methods such as M. Bi-Square, M. Hampel and M. Huber. The results show that there is a significant improvement in the reduction of the number of outliers and better prediction using hybrid model for the contaminated seaweed big data. For the highest variable importance of 45 significant drying parameters of seaweed, the hybrid model bagging M Bi-square performs better because it has the lowest percentage of outliers of 4.08 %.","PeriodicalId":342917,"journal":{"name":"Journal of the Nigerian Society of Physical Sciences","volume":"81 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Robust M-estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data\",\"authors\":\"O. Ibidoja, Fam Pei Shan, Mukhtar, J. Sulaiman, Majid Khan Majahar Ali\",\"doi\":\"10.46481/jnsps.2023.1137\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common problem in regression analysis using ordinary least squares (OLS) is the effect of outliers or contaminated data on the estimates of the parameters. A robust method that is not sensitive to outliers and can handle contaminated data is needed. In this study, the objective is to determine the significant parameters that determine the moisture content of the seaweed after drying and develop a hybrid model to reduce the outliers. The data were collected with sensors from the v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. After the second order interaction, we have 435 drying parameters, each parameter has 1914 observations. First, we used four machine learning algorithms, such as random forest, support vector machine, bagging and boosting to determine the significant parameters by selecting 15, 25, 35 and 45 parameters. Second, we developed the hybrid model using robust methods such as M. Bi-Square, M. Hampel and M. Huber. The results show that there is a significant improvement in the reduction of the number of outliers and better prediction using hybrid model for the contaminated seaweed big data. For the highest variable importance of 45 significant drying parameters of seaweed, the hybrid model bagging M Bi-square performs better because it has the lowest percentage of outliers of 4.08 %.\",\"PeriodicalId\":342917,\"journal\":{\"name\":\"Journal of the Nigerian Society of Physical Sciences\",\"volume\":\"81 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the Nigerian Society of Physical Sciences\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46481/jnsps.2023.1137\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Nigerian Society of Physical Sciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46481/jnsps.2023.1137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Robust M-estimators and Machine Learning Algorithms for Improving the Predictive Accuracy of Seaweed Contaminated Big Data
A common problem in regression analysis using ordinary least squares (OLS) is the effect of outliers or contaminated data on the estimates of the parameters. A robust method that is not sensitive to outliers and can handle contaminated data is needed. In this study, the objective is to determine the significant parameters that determine the moisture content of the seaweed after drying and develop a hybrid model to reduce the outliers. The data were collected with sensors from the v-Groove Hybrid Solar Drier (v-GHSD) at Semporna, South-Eastern Coast of Sabah, Malaysia. After the second order interaction, we have 435 drying parameters, each parameter has 1914 observations. First, we used four machine learning algorithms, such as random forest, support vector machine, bagging and boosting to determine the significant parameters by selecting 15, 25, 35 and 45 parameters. Second, we developed the hybrid model using robust methods such as M. Bi-Square, M. Hampel and M. Huber. The results show that there is a significant improvement in the reduction of the number of outliers and better prediction using hybrid model for the contaminated seaweed big data. For the highest variable importance of 45 significant drying parameters of seaweed, the hybrid model bagging M Bi-square performs better because it has the lowest percentage of outliers of 4.08 %.