A clinical decision support system using rough set theory and machine learning for disease prediction

IF 4.4 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Kamakhya Narain Singh, Jibendu Kumar Mantri
{"title":"A clinical decision support system using rough set theory and machine learning for disease prediction","authors":"Kamakhya Narain Singh,&nbsp;Jibendu Kumar Mantri","doi":"10.1016/j.imed.2023.08.002","DOIUrl":null,"url":null,"abstract":"<div><h3>Objective</h3><p>Technological advances have led to drastic changes in daily life, and particularly healthcare, while traditional diagnosis methods are being replaced by technology-oriented models and paper-based patient healthcare records with digital files. Using the latest technology and data mining techniques, we aimed to develop an automated clinical decision support system (CDSS), to improve patient prognoses and healthcare delivery. Our proposed approach placed a strong emphasis on improvements that meet patient, parent, and physician expectations. We developed a flexible framework to identify hepatitis, dermatological conditions, hepatic disease, and autism in adults and provide results to patients as recommendations. The novelty of this CDSS lies in its integration of rough set theory (RST) and machine learning (ML) techniques to improve clinical decision-making accuracy and effectiveness.</p></div><div><h3>Methods</h3><p>Data were collected through various web-based resources. Standard preprocessing techniques were applied to encode categorical features, conduct min-max scaling, and remove null and duplicate entries. The most prevalent feature in the class and standard deviation were used to fill missing categorical and continuous feature values, respectively. A rough set approach was applied as feature selection, to remove highly redundant and irrelevant elements. Then, various ML techniques, including K nearest neighbors (KNN), linear support vector machine (LSVM), radial basis function support vector machine (RBF SVM), decision tree (DT), random forest (RF), and Naive Bayes (NB), were employed to analyze four publicly available benchmark medical datasets of different types from the UCI repository and Kaggle. The model was implemented in Python, and various validity metrics, including precision, recall, F1-score, and root mean square error (RMSE), applied to measure its performance.</p></div><div><h3>Results</h3><p>Features were selected using an RST approach and examined by RF analysis and important features of hepatitis, dermatology conditions, hepatic disease, and autism determined by RST and RF exhibited 92.85 %, 90.90 %, 100 %, and 80 % similarity, respectively. Selected features were stored as electronic health records and various ML classifiers, such as KNN, LSVM, RBF SVM, DT, RF, and NB, applied to classify patients with hepatitis, dermatology conditions, hepatic disease, and autism. In the last phase, the performance of proposed classifiers was compared with that of existing state-of-the-art methods, using various validity measures. RF was found to be the best approach for adult screening of: hepatitis with accuracy 88.66 %, precision 74.46 %, recall 75.17 %, F1-score 74.81 %, and RMSE value 0.244; dermatology conditions with accuracy 97.29 %, precision 96.96 %, recall 96.96 %, F1-score 96.96 %, and RMSE value, 0.173; hepatic disease, with accuracy 91.58 %, precision 81.76 %, recall 81.82 %, F1-Score 81.79 %, and RMSE value 0.193; and autism, with accuracy 100 %, precision 100 %, recall 100 %, F1-score 100 %, and RMSE value 0.064.</p></div><div><h3>Conclusion</h3><p>The overall performance of our proposed framework may suggest that it could assist medical experts in more accurately identifying and diagnosing patients with hepatitis, dermatology conditions, hepatic disease, and autism.</p></div>","PeriodicalId":73400,"journal":{"name":"Intelligent medicine","volume":"4 3","pages":"Pages 200-208"},"PeriodicalIF":4.4000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667102624000196/pdfft?md5=d65ef7a4c0f4fb5b3f70cdc367b1f5ae&pid=1-s2.0-S2667102624000196-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent medicine","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667102624000196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Objective

Technological advances have led to drastic changes in daily life, and particularly healthcare, while traditional diagnosis methods are being replaced by technology-oriented models and paper-based patient healthcare records with digital files. Using the latest technology and data mining techniques, we aimed to develop an automated clinical decision support system (CDSS), to improve patient prognoses and healthcare delivery. Our proposed approach placed a strong emphasis on improvements that meet patient, parent, and physician expectations. We developed a flexible framework to identify hepatitis, dermatological conditions, hepatic disease, and autism in adults and provide results to patients as recommendations. The novelty of this CDSS lies in its integration of rough set theory (RST) and machine learning (ML) techniques to improve clinical decision-making accuracy and effectiveness.

Methods

Data were collected through various web-based resources. Standard preprocessing techniques were applied to encode categorical features, conduct min-max scaling, and remove null and duplicate entries. The most prevalent feature in the class and standard deviation were used to fill missing categorical and continuous feature values, respectively. A rough set approach was applied as feature selection, to remove highly redundant and irrelevant elements. Then, various ML techniques, including K nearest neighbors (KNN), linear support vector machine (LSVM), radial basis function support vector machine (RBF SVM), decision tree (DT), random forest (RF), and Naive Bayes (NB), were employed to analyze four publicly available benchmark medical datasets of different types from the UCI repository and Kaggle. The model was implemented in Python, and various validity metrics, including precision, recall, F1-score, and root mean square error (RMSE), applied to measure its performance.

Results

Features were selected using an RST approach and examined by RF analysis and important features of hepatitis, dermatology conditions, hepatic disease, and autism determined by RST and RF exhibited 92.85 %, 90.90 %, 100 %, and 80 % similarity, respectively. Selected features were stored as electronic health records and various ML classifiers, such as KNN, LSVM, RBF SVM, DT, RF, and NB, applied to classify patients with hepatitis, dermatology conditions, hepatic disease, and autism. In the last phase, the performance of proposed classifiers was compared with that of existing state-of-the-art methods, using various validity measures. RF was found to be the best approach for adult screening of: hepatitis with accuracy 88.66 %, precision 74.46 %, recall 75.17 %, F1-score 74.81 %, and RMSE value 0.244; dermatology conditions with accuracy 97.29 %, precision 96.96 %, recall 96.96 %, F1-score 96.96 %, and RMSE value, 0.173; hepatic disease, with accuracy 91.58 %, precision 81.76 %, recall 81.82 %, F1-Score 81.79 %, and RMSE value 0.193; and autism, with accuracy 100 %, precision 100 %, recall 100 %, F1-score 100 %, and RMSE value 0.064.

Conclusion

The overall performance of our proposed framework may suggest that it could assist medical experts in more accurately identifying and diagnosing patients with hepatitis, dermatology conditions, hepatic disease, and autism.

利用粗糙集理论和机器学习进行疾病预测的临床决策支持系统
目标技术进步导致日常生活,尤其是医疗保健发生了翻天覆地的变化,传统的诊断方法正在被以技术为导向的模型和以数字文件为基础的纸质病人医疗记录所取代。我们利用最新技术和数据挖掘技术,旨在开发一种自动临床决策支持系统(CDSS),以改善病人预后和医疗服务。我们提出的方法着重强调满足患者、家长和医生期望的改进。我们开发了一个灵活的框架,用于识别成人肝炎、皮肤病、肝病和自闭症,并将结果作为建议提供给患者。该 CDSS 的新颖之处在于整合了粗糙集理论(RST)和机器学习(ML)技术,以提高临床决策的准确性和有效性。数据通过各种网络资源收集,采用标准预处理技术对分类特征进行编码,进行最小-最大缩放,并删除空条目和重复条目。类中最普遍的特征和标准偏差分别用于填补缺失的分类和连续特征值。特征选择采用粗糙集方法,以去除高度冗余和不相关的元素。然后,采用了多种 ML 技术,包括 K 最近邻(KNN)、线性支持向量机(LSVM)、径向基函数支持向量机(RBF SVM)、决策树(DT)、随机森林(RF)和奈夫贝叶斯(NB),对来自 UCI 存储库和 Kaggle 的四个不同类型的公开基准医疗数据集进行分析。该模型用 Python 实现,并采用了包括精确度、召回率、F1-分数和均方根误差 (RMSE) 在内的各种有效性指标来衡量其性能。结果使用 RST 方法选择特征,并通过 RF 分析进行检查,RST 和 RF 确定的肝炎、皮肤病、肝病和自闭症的重要特征分别表现出 92.85 %、90.90 %、100 % 和 80 % 的相似性。将选定的特征存储为电子健康记录,并应用 KNN、LSVM、RBF SVM、DT、RF 和 NB 等多种 ML 分类器对肝炎、皮肤病、肝病和自闭症患者进行分类。在最后阶段,利用各种有效性测量方法,将所提出的分类器的性能与现有的最先进方法进行了比较。结果发现 RF 是成人筛查以下疾病的最佳方法:肝炎,准确率 88.66 %,精确率 74.46 %,召回率 75.17 %,F1-分数 74.81 %,RMSE 值 0.244;皮肤病,准确率 97.29 %,精确率 96.96 %,召回率 96.96 %,F1-分数 96.96 %,RMSE 值 0.173;肝病,准确率 91.58 %,精确率 81.76 %,召回率 81.结论我们提出的框架的总体性能表明,它可以帮助医学专家更准确地识别和诊断肝炎、皮肤病、肝病和自闭症患者。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Intelligent medicine
Intelligent medicine Surgery, Radiology and Imaging, Artificial Intelligence, Biomedical Engineering
CiteScore
5.20
自引率
0.00%
发文量
19
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信