A. Alharan, Zahraa M. Algelal, Nabeel Salih Ali, Nora Al-Garaawi
{"title":"Improving Classification Performance for Diabetes with Linear Discriminant Analysis and Genetic Algorithm","authors":"A. Alharan, Zahraa M. Algelal, Nabeel Salih Ali, Nora Al-Garaawi","doi":"10.1109/PICICT53635.2021.00019","DOIUrl":null,"url":null,"abstract":"In the modern-day, Diabetic disease is one of the most chronic and appalling diseases humanity faces. There are 463 million people had Diabetes worldwide, and it caused approximately 4.2 million deaths, according to the International Diabetes Federation (IDF) Diabetes Atlas Ninth edition 2019. Therefore diabetic patients need state-of-the-art healthcare against such diseases and propose early prediction to help decrease the risks related to such diseases. In this context, this research, a diabetes diagnosis system, has proposed to analyze two different diabetes datasets, namely PIMA Indian Diabetes and data of Dr. John Schorling. Linear Discriminant Analysis (LDA) and Genetic algorithm (GA) methods used for feature selection and four techniques implemented to evaluate the classification are Bagging algorithm, Random forest, Logistic Model Tree (LMT), and JRip algorithm. The results have shown that a random forest classifier using LDA and GA obtained better accuracy (90.89%) in DatasetI. At the same time, DatasetII is better than GA in Random forest, random forest-LDA, JRip-LDA classifiers (91.44%).","PeriodicalId":308869,"journal":{"name":"2021 Palestinian International Conference on Information and Communication Technology (PICICT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 Palestinian International Conference on Information and Communication Technology (PICICT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PICICT53635.2021.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
In the modern-day, Diabetic disease is one of the most chronic and appalling diseases humanity faces. There are 463 million people had Diabetes worldwide, and it caused approximately 4.2 million deaths, according to the International Diabetes Federation (IDF) Diabetes Atlas Ninth edition 2019. Therefore diabetic patients need state-of-the-art healthcare against such diseases and propose early prediction to help decrease the risks related to such diseases. In this context, this research, a diabetes diagnosis system, has proposed to analyze two different diabetes datasets, namely PIMA Indian Diabetes and data of Dr. John Schorling. Linear Discriminant Analysis (LDA) and Genetic algorithm (GA) methods used for feature selection and four techniques implemented to evaluate the classification are Bagging algorithm, Random forest, Logistic Model Tree (LMT), and JRip algorithm. The results have shown that a random forest classifier using LDA and GA obtained better accuracy (90.89%) in DatasetI. At the same time, DatasetII is better than GA in Random forest, random forest-LDA, JRip-LDA classifiers (91.44%).
在现代,糖尿病是人类面临的最慢性和最可怕的疾病之一。根据国际糖尿病联合会(IDF) 2019年糖尿病地图集第九版,全球有4.63亿糖尿病患者,造成约420万人死亡。因此,糖尿病患者需要针对这些疾病的最先进的医疗保健,并提出早期预测,以帮助减少与这些疾病相关的风险。在此背景下,本研究作为一个糖尿病诊断系统,提出了分析两个不同的糖尿病数据集,即PIMA Indian diabetes和Dr. John Schorling的数据。特征选择采用线性判别分析(LDA)和遗传算法(GA)方法,分类评价采用Bagging算法、随机森林、Logistic模型树(LMT)和JRip算法。结果表明,采用LDA和GA的随机森林分类器在DatasetI中获得了更好的准确率(90.89%)。同时,在Random forest、Random forest- lda、JRip-LDA分类器上,DatasetII优于GA(91.44%)。