{"title":"Machine Learning Model For Stunting Prediction","authors":"Sutarmi Sutarmi, Warijan Warijan, Tavip Indrayana, Dwi P. Putro B, Indra Gunawan","doi":"10.46799/jhs.v4i9.1073","DOIUrl":null,"url":null,"abstract":"This study aims to find the best Supervised Machine Learning (SML) model for stunting prediction. This research was conducted using an experimental approach using 192 infant data with a composition of 183 normal infant data and 9 stunted infant data using a custom dataset. The conclusion obtained from this study can be concluded that the combination of the Random Forest classification algorithm with Support Vector Machine Weighting and the Genetic Algorithm Feature Selection has the best performance. The parameters with the best performance are: The training and testing data distribution is 90% of the training data and 10% of the testing data. The number of trees in the random forest algorithm is 100, and the Gain Ratio criterion and max_depth is 10. In the Genetic Algorithm, the best parameters are: The Roulette Wheel selection method, the population is 20, the mutation value is 0.03, and the crossover value is 0.9. The validation method uses k-fold cross validation with a value of k = 10. Another conclusion is that there are 44 supporting factors for stunting, which, if we take a ranking of 10 in order of magnitude from largest to smallest, the supporting factors for stunting are 1.Baby's weight at birth. 2.Baby’s Height at Birth. 3.Number of meal per day. 4.Breast Milk. 5.Diarrhe times per 3 month. 6.Child development examination during covid by Health Worker at home. 7.Mother's age at birth. 8.Mother height at birth. 9.Number of sibling. 10.Age when the first food was given. This research has the disadvantage of no test on other datasets. So researchers do not know the reliability of findings is on different datasets","PeriodicalId":431769,"journal":{"name":"Jurnal Health Sains","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Jurnal Health Sains","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46799/jhs.v4i9.1073","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This study aims to find the best Supervised Machine Learning (SML) model for stunting prediction. This research was conducted using an experimental approach using 192 infant data with a composition of 183 normal infant data and 9 stunted infant data using a custom dataset. The conclusion obtained from this study can be concluded that the combination of the Random Forest classification algorithm with Support Vector Machine Weighting and the Genetic Algorithm Feature Selection has the best performance. The parameters with the best performance are: The training and testing data distribution is 90% of the training data and 10% of the testing data. The number of trees in the random forest algorithm is 100, and the Gain Ratio criterion and max_depth is 10. In the Genetic Algorithm, the best parameters are: The Roulette Wheel selection method, the population is 20, the mutation value is 0.03, and the crossover value is 0.9. The validation method uses k-fold cross validation with a value of k = 10. Another conclusion is that there are 44 supporting factors for stunting, which, if we take a ranking of 10 in order of magnitude from largest to smallest, the supporting factors for stunting are 1.Baby's weight at birth. 2.Baby’s Height at Birth. 3.Number of meal per day. 4.Breast Milk. 5.Diarrhe times per 3 month. 6.Child development examination during covid by Health Worker at home. 7.Mother's age at birth. 8.Mother height at birth. 9.Number of sibling. 10.Age when the first food was given. This research has the disadvantage of no test on other datasets. So researchers do not know the reliability of findings is on different datasets