Optimizing Bagged Trees in an Ensemble Classifier for Improved Prediction of Diabetes Prevalence in Women

IF 0.6 Q3 MULTIDISCIPLINARY SCIENCES

Pertanika Journal of Science and Technology Pub Date : 2024-07-22 DOI:10.47836/pjst.32.4.16

Jose Candia Jr., Airish Mae Adonis, Jesica Perlas

{"title":"Optimizing Bagged Trees in an Ensemble Classifier for Improved Prediction of Diabetes Prevalence in Women","authors":"Jose Candia Jr., Airish Mae Adonis, Jesica Perlas","doi":"10.47836/pjst.32.4.16","DOIUrl":null,"url":null,"abstract":"This study aims to optimize the performance of the bagged tree in an ensemble classifier for predicting diabetes prevalence in women. The study used a dataset of 1,888 women with six features: age, BMI, glucose level, insulin level, blood pressure, and pregnancy status. The dataset was divided into training and testing sets with a 70:30 ratio. The bagged tree ensemble classifier was used for the analysis, and five-fold cross-validation was applied. The study found that using all features during training resulted in a 92.3% training accuracy and a 99.5% testing accuracy. However, applying optimization techniques such as feature selection, parameter tuning, and a maximum number of splits improved model performance. Feature selection optimized the accuracy performance by 0.2%, while parameter tuning improved the test accuracy by 0.2%. Moreover, decreasing the maximum number of splits from 1322 to 800 or 600 resulted in an optimized model with 0.1% higher validation accuracy. Finally, the optimized bagged tree models were evaluated using various performance metrics, including accuracy, precision, recall, and F1 score. The study found that Model 1, which used 800 maximum number of splits and 50 learners, outperformed Model 2 in terms of recall and F1 score, while Model 2, which used 600 maximum number of splits and 50 learners, had a higher precision score. The study concludes that optimization techniques can significantly improve the performance of the bagged tree in predicting diabetes prevalence in women.","PeriodicalId":46234,"journal":{"name":"Pertanika Journal of Science and Technology","volume":null,"pages":null},"PeriodicalIF":0.6000,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pertanika Journal of Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.47836/pjst.32.4.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

This study aims to optimize the performance of the bagged tree in an ensemble classifier for predicting diabetes prevalence in women. The study used a dataset of 1,888 women with six features: age, BMI, glucose level, insulin level, blood pressure, and pregnancy status. The dataset was divided into training and testing sets with a 70:30 ratio. The bagged tree ensemble classifier was used for the analysis, and five-fold cross-validation was applied. The study found that using all features during training resulted in a 92.3% training accuracy and a 99.5% testing accuracy. However, applying optimization techniques such as feature selection, parameter tuning, and a maximum number of splits improved model performance. Feature selection optimized the accuracy performance by 0.2%, while parameter tuning improved the test accuracy by 0.2%. Moreover, decreasing the maximum number of splits from 1322 to 800 or 600 resulted in an optimized model with 0.1% higher validation accuracy. Finally, the optimized bagged tree models were evaluated using various performance metrics, including accuracy, precision, recall, and F1 score. The study found that Model 1, which used 800 maximum number of splits and 50 learners, outperformed Model 2 in terms of recall and F1 score, while Model 2, which used 600 maximum number of splits and 50 learners, had a higher precision score. The study concludes that optimization techniques can significantly improve the performance of the bagged tree in predicting diabetes prevalence in women.

查看原文本刊更多论文

优化组合分类器中的袋装树，提高对女性糖尿病患病率的预测能力

本研究旨在优化集合分类器中袋装树在预测女性糖尿病患病率方面的性能。研究使用了一个包含年龄、体重指数、血糖水平、胰岛素水平、血压和妊娠状况等六个特征的数据集，该数据集有1888名女性。数据集按 70:30 的比例分为训练集和测试集。分析中使用了袋装树集合分类器，并进行了五倍交叉验证。研究发现，在训练过程中使用所有特征的训练准确率为 92.3%，测试准确率为 99.5%。然而，应用特征选择、参数调整和最大分割数等优化技术提高了模型性能。特征选择使准确率提高了 0.2%，而参数调整则使测试准确率提高了 0.2%。此外，将最大分割数从 1322 减少到 800 或 600，可使优化模型的验证准确率提高 0.1%。最后，使用各种性能指标（包括准确率、精确度、召回率和 F1 分数）对优化后的袋装树模型进行了评估。研究发现，使用 800 个最大分割数和 50 个学习者的模型 1 在召回率和 F1 分数方面优于模型 2，而使用 600 个最大分割数和 50 个学习者的模型 2 则具有更高的精度分数。研究得出结论，优化技术可以显著提高袋装树在预测女性糖尿病患病率方面的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pertanika Journal of Science and Technology MULTIDISCIPLINARY SCIENCES-

CiteScore

1.50

自引率

16.70%

发文量

178

期刊介绍： Pertanika Journal of Science and Technology aims to provide a forum for high quality research related to science and engineering research. Areas relevant to the scope of the journal include: bioinformatics, bioscience, biotechnology and bio-molecular sciences, chemistry, computer science, ecology, engineering, engineering design, environmental control and management, mathematics and statistics, medicine and health sciences, nanotechnology, physics, safety and emergency management, and related fields of study.