Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers

IF 4 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Mohammad H. Alshayeji
{"title":"Early Thyroid Risk Prediction by Data Mining and Ensemble Classifiers","authors":"Mohammad H. Alshayeji","doi":"10.3390/make5030061","DOIUrl":null,"url":null,"abstract":"Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.","PeriodicalId":93033,"journal":{"name":"Machine learning and knowledge extraction","volume":"12 1","pages":"0"},"PeriodicalIF":4.0000,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine learning and knowledge extraction","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3390/make5030061","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Thyroid disease is among the most prevalent endocrinopathies worldwide. As the thyroid gland controls human metabolism, thyroid illness is a matter of concern for human health. To save time and reduce error rates, an automatic, reliable, and accurate thyroid identification machine-learning (ML) system is essential. The proposed model aims to address existing work limitations such as the lack of detailed feature analysis, visualization, improvement in prediction accuracy, and reliability. Here, a public thyroid illness dataset containing 29 clinical features from the University of California, Irvine ML repository was used. The clinical features helped us to build an ML model that can predict thyroid illness by analyzing early symptoms and replacing the manual analysis of these attributes. Feature analysis and visualization facilitate an understanding of the role of features in thyroid prediction tasks. In addition, the overfitting problem was eliminated by 5-fold cross-validation and data balancing using the synthetic minority oversampling technique (SMOTE). Ensemble learning ensures prediction model reliability owing to the involvement of multiple classifiers in the prediction decisions. The proposed model achieved 99.5% accuracy, 99.39% sensitivity, and 99.59% specificity with the boosting method which is applicable to real-time computer-aided diagnosis (CAD) systems to ease diagnosis and promote early treatment.
基于数据挖掘和集成分类器的早期甲状腺风险预测
甲状腺疾病是世界上最常见的内分泌疾病之一。由于甲状腺控制着人体的代谢,甲状腺疾病是一个关注人类健康的问题。为了节省时间和降低错误率,一个自动、可靠、准确的甲状腺识别机器学习(ML)系统是必不可少的。该模型旨在解决现有工作的局限性,如缺乏详细的特征分析、可视化、预测精度和可靠性的提高。在这里,使用了一个公共甲状腺疾病数据集,其中包含来自加州大学欧文分校ML存储库的29个临床特征。临床特征帮助我们建立了一个ML模型,可以通过分析早期症状来预测甲状腺疾病,并取代手工分析这些属性。特征分析和可视化有助于理解特征在甲状腺预测任务中的作用。此外,使用合成少数过采样技术(SMOTE)通过5倍交叉验证和数据平衡消除了过拟合问题。由于在预测决策中涉及多个分类器,集成学习保证了预测模型的可靠性。该模型准确率为99.5%,灵敏度为99.39%,特异度为99.59%,可应用于实时计算机辅助诊断(CAD)系统,方便诊断,促进早期治疗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
6.30
自引率
0.00%
发文量
0
审稿时长
7 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信