M. R. Belgaum, Telugu Harsha Charitha, Munurathi Harini, B. Anusha, Ala Jayasri Sai, Undralla Chandana Yadav, Z. Alansari
{"title":"Enhancing the Efficiency of Diabetes Prediction through Training and Classification using PCA and LR Model","authors":"M. R. Belgaum, Telugu Harsha Charitha, Munurathi Harini, B. Anusha, Ala Jayasri Sai, Undralla Chandana Yadav, Z. Alansari","doi":"10.33166/aetic.2023.03.004","DOIUrl":null,"url":null,"abstract":"In this paper, we introduce a new approach for predicting the risk of diabetes using a combination of Principal Component Analysis (PCA) and Logistic Regression (LR). Our method offers a unique solution that could lead to more accurate and efficient predictions of diabetes risk. To develop an effective model for predicting diabetes, it is important to consider various clinical and demographic factors contributing to the disease's development. This approach typically involves training the model on a large dataset that includes these factors. By doing so, we can better understand how different characteristics can impact the development of diabetes and create more accurate predictions for individuals at risk. The PCA method is employed to reduce the dataset's dimensions and augment the model's computational efficacy. The LR model then classifies patients into diabetic or non-diabetic groups. Accuracy, precision, recall, the F1-score, and the area under the ROC curve (AUC) are only a few of the indicators used to evaluate the performance of the proposed model. Pima Indian Diabetes Data (PIDD) is used to evaluate the model, and the results demonstrate a significant improvement over the state-of-the-art methods. The proposed model presents an efficient and effective method for predicting diabetes risk that may have significant implications for improving healthcare outcomes and reducing healthcare costs. The proposed PCA-LR model outperforms other algorithms, such as SVM and RF, especially in terms of accuracy, while optimizing computational complexity. This approach can potentially provide a practical and efficient solution for large-scale diabetes screening programs.","PeriodicalId":36440,"journal":{"name":"Annals of Emerging Technologies in Computing","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Emerging Technologies in Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33166/aetic.2023.03.004","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we introduce a new approach for predicting the risk of diabetes using a combination of Principal Component Analysis (PCA) and Logistic Regression (LR). Our method offers a unique solution that could lead to more accurate and efficient predictions of diabetes risk. To develop an effective model for predicting diabetes, it is important to consider various clinical and demographic factors contributing to the disease's development. This approach typically involves training the model on a large dataset that includes these factors. By doing so, we can better understand how different characteristics can impact the development of diabetes and create more accurate predictions for individuals at risk. The PCA method is employed to reduce the dataset's dimensions and augment the model's computational efficacy. The LR model then classifies patients into diabetic or non-diabetic groups. Accuracy, precision, recall, the F1-score, and the area under the ROC curve (AUC) are only a few of the indicators used to evaluate the performance of the proposed model. Pima Indian Diabetes Data (PIDD) is used to evaluate the model, and the results demonstrate a significant improvement over the state-of-the-art methods. The proposed model presents an efficient and effective method for predicting diabetes risk that may have significant implications for improving healthcare outcomes and reducing healthcare costs. The proposed PCA-LR model outperforms other algorithms, such as SVM and RF, especially in terms of accuracy, while optimizing computational complexity. This approach can potentially provide a practical and efficient solution for large-scale diabetes screening programs.
在本文中,我们介绍了一种使用主成分分析(PCA)和逻辑回归(LR)相结合来预测糖尿病风险的新方法。我们的方法提供了一种独特的解决方案,可以更准确有效地预测糖尿病风险。为了开发一个有效的糖尿病预测模型,重要的是要考虑导致疾病发展的各种临床和人口因素。这种方法通常涉及在包括这些因素的大型数据集上训练模型。通过这样做,我们可以更好地了解不同的特征如何影响糖尿病的发展,并为有风险的个体做出更准确的预测。PCA方法用于减少数据集的维数并增强模型的计算效率。LR模型然后将患者分为糖尿病组或非糖尿病组。准确性、精密度、召回率、F1分数和ROC曲线下面积(AUC)只是用于评估所提出模型性能的几个指标。Pima Indian Diabetes Data(PIDD)用于评估该模型,结果表明与最先进的方法相比有了显著的改进。所提出的模型为预测糖尿病风险提供了一种高效有效的方法,可能对改善医疗保健结果和降低医疗保健成本具有重要意义。所提出的PCA-LR模型优于其他算法,如SVM和RF,尤其是在精度方面,同时优化了计算复杂性。这种方法可能为大规模糖尿病筛查项目提供一种实用有效的解决方案。