基于机器学习的糖尿病检测模型减少假阴性

Biomedical materials & devices (New York, N.Y.) Pub Date : 2023-06-21 DOI:10.1007/s44174-023-00104-w

Md. Ashraf Uddin, Md. Manowarul Islam, Md. Alamin Talukder, Md. Al Amin Hossain, Arnisha Akhter, Sunil Aryal, Maisha Muntaha

{"title":"基于机器学习的糖尿病检测模型减少假阴性","authors":"Md. Ashraf Uddin, Md. Manowarul Islam, Md. Alamin Talukder, Md. Al Amin Hossain, Arnisha Akhter, Sunil Aryal, Maisha Muntaha","doi":"10.1007/s44174-023-00104-w","DOIUrl":null,"url":null,"abstract":"Diabetes is a chronic disease characterized by the inability of the pancreas to produce enough insulin or the body’s inability to use insulin efficiently. This disease is becoming increasingly prevalent worldwide and can result in severe complications such as blindness, kidney failure, and stroke. Early detection of diabetes can potentially save millions of lives globally, making it a crucial focus of research. In this study, we propose a machine learning model to aid in predicting diabetes. The model comprises several machine learning methods: Linear Regression (LnR), Logistic Regression (LR), k-nearest neighbor (KNN), Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), and Decision Tree (DT). Prior to feeding the pre-processed data into the machine learning model for evaluation, we conducted several pre-processing steps, such as removing null values, standardizing data using normalization, and labeling data using the label encoding process. Imbalanced datasets can adversely affect the accuracy of machine learning algorithms, and we address this problem by balancing the datasets using the Synthetic Minority Oversampling Technique (SMOTE) method. We assessed the model’s performance on two datasets and found that the random forest algorithm produced optimal results, with 97% accuracy on the diabetes dataset 2019 and 80% accuracy on the Pima Indian dataset. However, using a balanced dataset, we can significantly reduce the number of false-negative detections.","PeriodicalId":72388,"journal":{"name":"Biomedical materials & devices (New York, N.Y.)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Machine Learning Based Diabetes Detection Model for False Negative Reduction\",\"authors\":\"Md. Ashraf Uddin, Md. Manowarul Islam, Md. Alamin Talukder, Md. Al Amin Hossain, Arnisha Akhter, Sunil Aryal, Maisha Muntaha\",\"doi\":\"10.1007/s44174-023-00104-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diabetes is a chronic disease characterized by the inability of the pancreas to produce enough insulin or the body’s inability to use insulin efficiently. This disease is becoming increasingly prevalent worldwide and can result in severe complications such as blindness, kidney failure, and stroke. Early detection of diabetes can potentially save millions of lives globally, making it a crucial focus of research. In this study, we propose a machine learning model to aid in predicting diabetes. The model comprises several machine learning methods: Linear Regression (LnR), Logistic Regression (LR), k-nearest neighbor (KNN), Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), and Decision Tree (DT). Prior to feeding the pre-processed data into the machine learning model for evaluation, we conducted several pre-processing steps, such as removing null values, standardizing data using normalization, and labeling data using the label encoding process. Imbalanced datasets can adversely affect the accuracy of machine learning algorithms, and we address this problem by balancing the datasets using the Synthetic Minority Oversampling Technique (SMOTE) method. We assessed the model’s performance on two datasets and found that the random forest algorithm produced optimal results, with 97% accuracy on the diabetes dataset 2019 and 80% accuracy on the Pima Indian dataset. However, using a balanced dataset, we can significantly reduce the number of false-negative detections.\",\"PeriodicalId\":72388,\"journal\":{\"name\":\"Biomedical materials & devices (New York, N.Y.)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biomedical materials & devices (New York, N.Y.)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s44174-023-00104-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biomedical materials & devices (New York, N.Y.)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s44174-023-00104-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

糖尿病是一种慢性疾病，其特征是胰腺不能产生足够的胰岛素或身体不能有效地使用胰岛素。这种疾病在世界范围内日益流行，可导致严重的并发症，如失明、肾衰竭和中风。糖尿病的早期发现有可能挽救全球数百万人的生命，使其成为研究的关键焦点。在这项研究中，我们提出了一个机器学习模型来帮助预测糖尿病。该模型包括几种机器学习方法:线性回归(LnR)、逻辑回归(LR)、k近邻(KNN)、朴素贝叶斯(NB)、随机森林(RF)、支持向量机(SVM)和决策树(DT)。在将预处理数据输入机器学习模型进行评估之前，我们执行了几个预处理步骤，例如删除空值，使用归一化对数据进行标准化，以及使用标签编码过程对数据进行标记。不平衡的数据集会对机器学习算法的准确性产生不利影响，我们通过使用合成少数派过采样技术(SMOTE)方法平衡数据集来解决这个问题。我们评估了该模型在两个数据集上的性能，发现随机森林算法产生了最佳结果，2019年糖尿病数据集的准确率为97%，皮马印第安人数据集的准确率为80%。然而，使用平衡的数据集，我们可以显著减少假阴性检测的数量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine Learning Based Diabetes Detection Model for False Negative Reduction

Diabetes is a chronic disease characterized by the inability of the pancreas to produce enough insulin or the body’s inability to use insulin efficiently. This disease is becoming increasingly prevalent worldwide and can result in severe complications such as blindness, kidney failure, and stroke. Early detection of diabetes can potentially save millions of lives globally, making it a crucial focus of research. In this study, we propose a machine learning model to aid in predicting diabetes. The model comprises several machine learning methods: Linear Regression (LnR), Logistic Regression (LR), k-nearest neighbor (KNN), Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), and Decision Tree (DT). Prior to feeding the pre-processed data into the machine learning model for evaluation, we conducted several pre-processing steps, such as removing null values, standardizing data using normalization, and labeling data using the label encoding process. Imbalanced datasets can adversely affect the accuracy of machine learning algorithms, and we address this problem by balancing the datasets using the Synthetic Minority Oversampling Technique (SMOTE) method. We assessed the model’s performance on two datasets and found that the random forest algorithm produced optimal results, with 97% accuracy on the diabetes dataset 2019 and 80% accuracy on the Pima Indian dataset. However, using a balanced dataset, we can significantly reduce the number of false-negative detections.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Biomedical materials & devices (New York, N.Y.)

自引率

0.00%

发文量