基于决策树和随机森林分类器的逻辑回归和规则提取的糖尿病预测

M. Bhattacharya, D. Datta
{"title":"基于决策树和随机森林分类器的逻辑回归和规则提取的糖尿病预测","authors":"M. Bhattacharya, D. Datta","doi":"10.1109/INCET57972.2023.10170270","DOIUrl":null,"url":null,"abstract":"The research work in this manuscript is focused towards extraction of rules from decision tree classifier to predict the status of a patient suffering diabetic. Basic approach of machine learning algorithm to classify diabetic condition of a patient depends on various features such as glucose, blood pressure, insulin, skin thickness, body mass index (BMI), diabetic pedigree function and age. Decision trees are easily interpretable machine learning models as classifiers whose predictive accuracy is low. However, in comparison random forest machine learning tree ensembles show high predictive accuracy while being regarded as black-box models. In this work, we have developed an algorithm to extract decision rules from the corresponding tree in the form of human readable format (IF antecedent, THEN consequent). We have also provided logistic regression model and tree structure of random forest model to classify the diabetic condition. Experimental results of 768 women samples from PIMA Indian datasets of diabetic proves that the proposed rule extraction methodology outperform similar recently developed methods in terms of human comprehension and also limits the number of antecedents in the retained rules, while preserving the same level of accuracy. Performance of all machine learning classifier models are measured in terms of various metrics such as recall, precision, accuracy and F1-score via confusion matrix.","PeriodicalId":403008,"journal":{"name":"2023 4th International Conference for Emerging Technology (INCET)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diabetes Prediction using Logistic Regression and Rule Extraction from Decision Tree and Random Forest Classifiers\",\"authors\":\"M. Bhattacharya, D. Datta\",\"doi\":\"10.1109/INCET57972.2023.10170270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The research work in this manuscript is focused towards extraction of rules from decision tree classifier to predict the status of a patient suffering diabetic. Basic approach of machine learning algorithm to classify diabetic condition of a patient depends on various features such as glucose, blood pressure, insulin, skin thickness, body mass index (BMI), diabetic pedigree function and age. Decision trees are easily interpretable machine learning models as classifiers whose predictive accuracy is low. However, in comparison random forest machine learning tree ensembles show high predictive accuracy while being regarded as black-box models. In this work, we have developed an algorithm to extract decision rules from the corresponding tree in the form of human readable format (IF antecedent, THEN consequent). We have also provided logistic regression model and tree structure of random forest model to classify the diabetic condition. Experimental results of 768 women samples from PIMA Indian datasets of diabetic proves that the proposed rule extraction methodology outperform similar recently developed methods in terms of human comprehension and also limits the number of antecedents in the retained rules, while preserving the same level of accuracy. Performance of all machine learning classifier models are measured in terms of various metrics such as recall, precision, accuracy and F1-score via confusion matrix.\",\"PeriodicalId\":403008,\"journal\":{\"name\":\"2023 4th International Conference for Emerging Technology (INCET)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 4th International Conference for Emerging Technology (INCET)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INCET57972.2023.10170270\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 4th International Conference for Emerging Technology (INCET)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INCET57972.2023.10170270","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文的研究工作主要集中在从决策树分类器中提取规则来预测糖尿病患者的状态。机器学习算法对糖尿病患者进行分类的基本方法取决于血糖、血压、胰岛素、皮肤厚度、体重指数(BMI)、糖尿病谱系功能和年龄等各种特征。决策树是一种易于解释的机器学习模型,是一种预测精度较低的分类器。然而,相比之下,随机森林机器学习树集成在被视为黑盒模型的情况下显示出较高的预测精度。在这项工作中,我们开发了一种算法,以人类可读格式(IF先行,THEN顺次)的形式从相应的树中提取决策规则。我们还提供了logistic回归模型和随机森林模型的树形结构来对糖尿病进行分类。来自PIMA印度糖尿病数据集的768名女性样本的实验结果证明,所提出的规则提取方法在人类理解方面优于最近开发的类似方法,并且在保留规则中限制了前词的数量,同时保持了相同的准确性。所有机器学习分类器模型的性能都是根据各种指标来衡量的,如召回率、精度、准确性和通过混淆矩阵的f1分数。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diabetes Prediction using Logistic Regression and Rule Extraction from Decision Tree and Random Forest Classifiers
The research work in this manuscript is focused towards extraction of rules from decision tree classifier to predict the status of a patient suffering diabetic. Basic approach of machine learning algorithm to classify diabetic condition of a patient depends on various features such as glucose, blood pressure, insulin, skin thickness, body mass index (BMI), diabetic pedigree function and age. Decision trees are easily interpretable machine learning models as classifiers whose predictive accuracy is low. However, in comparison random forest machine learning tree ensembles show high predictive accuracy while being regarded as black-box models. In this work, we have developed an algorithm to extract decision rules from the corresponding tree in the form of human readable format (IF antecedent, THEN consequent). We have also provided logistic regression model and tree structure of random forest model to classify the diabetic condition. Experimental results of 768 women samples from PIMA Indian datasets of diabetic proves that the proposed rule extraction methodology outperform similar recently developed methods in terms of human comprehension and also limits the number of antecedents in the retained rules, while preserving the same level of accuracy. Performance of all machine learning classifier models are measured in terms of various metrics such as recall, precision, accuracy and F1-score via confusion matrix.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信