Multi-Class Classification Method with Feature Engineering for Predicting Hypertension with Diabetes

Mongkhon Sinsirimongkhon, Sujitra Arwatchananukul, P. Temdee
{"title":"Multi-Class Classification Method with Feature Engineering for Predicting Hypertension with Diabetes","authors":"Mongkhon Sinsirimongkhon, Sujitra Arwatchananukul, P. Temdee","doi":"10.13052/jmm1550-4646.1937","DOIUrl":null,"url":null,"abstract":"Machine learning–based methods are widely applied for the prediction of noncommunicable diseases (NCDs), such as hypertension, diabetes, and cardiovascular disease. However, few models have been developed for predicting hypertension with diabetes, even though these diseases generally co-occur and can cause devastating harm to patients. This paper proposes a multi-class classification method that will be able to predict hypertension with diabetes. The proposed method consists of data preprocessing, model construction and validation, and model comparison. For data preprocessing, feature engineering of corresponding data types is conducted. For model construction, several machine learning methods are applied, including Random Forest (RF), Gradient Boosting (GB), Extra Tree (ET), Decision Tree (DCT), and Support Vector Machine (SVM). The dataset used in this study consists of 17,077 records and 28 features, obtained from Phaya Mengrai Hospital, Chiang Rai, Thailand. The predictive performance of each model with and without feature engineering is compared in terms of accuracy and average area under the Receiver Operating Characteristic curve (AUC-ROC). From the comparison results, SVM with feature engineering outperformed other models based on accuracy and average AUC-ROC achieving a value of 88.39% and 93.32%, respectively. For all ensemble learning–based methods, RF performed the best in terms of both accuracy and average AUC-ROC for both with and without feature engineering. Overall, all the models performed better when feature engineering was applied.","PeriodicalId":425561,"journal":{"name":"J. Mobile Multimedia","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Mobile Multimedia","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.13052/jmm1550-4646.1937","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Machine learning–based methods are widely applied for the prediction of noncommunicable diseases (NCDs), such as hypertension, diabetes, and cardiovascular disease. However, few models have been developed for predicting hypertension with diabetes, even though these diseases generally co-occur and can cause devastating harm to patients. This paper proposes a multi-class classification method that will be able to predict hypertension with diabetes. The proposed method consists of data preprocessing, model construction and validation, and model comparison. For data preprocessing, feature engineering of corresponding data types is conducted. For model construction, several machine learning methods are applied, including Random Forest (RF), Gradient Boosting (GB), Extra Tree (ET), Decision Tree (DCT), and Support Vector Machine (SVM). The dataset used in this study consists of 17,077 records and 28 features, obtained from Phaya Mengrai Hospital, Chiang Rai, Thailand. The predictive performance of each model with and without feature engineering is compared in terms of accuracy and average area under the Receiver Operating Characteristic curve (AUC-ROC). From the comparison results, SVM with feature engineering outperformed other models based on accuracy and average AUC-ROC achieving a value of 88.39% and 93.32%, respectively. For all ensemble learning–based methods, RF performed the best in terms of both accuracy and average AUC-ROC for both with and without feature engineering. Overall, all the models performed better when feature engineering was applied.
基于特征工程的多类分类方法预测高血压合并糖尿病
基于机器学习的方法被广泛应用于非传染性疾病(ncd)的预测,如高血压、糖尿病和心血管疾病。然而,很少有模型用于预测高血压合并糖尿病,尽管这些疾病通常同时发生,并可能对患者造成毁灭性的伤害。本文提出了一种能够预测糖尿病合并高血压的多类分类方法。该方法包括数据预处理、模型构建与验证、模型比较三个部分。对于数据预处理,进行相应数据类型的特征工程。对于模型的构建,采用了几种机器学习方法,包括随机森林(RF)、梯度增强(GB)、额外树(ET)、决策树(DCT)和支持向量机(SVM)。本研究中使用的数据集包括17,077条记录和28个特征,来自泰国清莱Phaya mengai医院。从准确度和接受者工作特征曲线下的平均面积(AUC-ROC)两方面比较了有特征工程和没有特征工程的每个模型的预测性能。从对比结果来看,基于特征工程的SVM在准确率和AUC-ROC均值方面优于其他模型,分别达到了88.39%和93.32%。对于所有基于集成学习的方法,无论是否进行特征工程,RF在准确性和平均AUC-ROC方面都表现最好。总的来说,当应用特征工程时,所有的模型都表现得更好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信