使用混合数据级、成本敏感和集合方法增强不平衡医学数据集的分类能力

Ayushi Gupta, Shikha Gupta
{"title":"使用混合数据级、成本敏感和集合方法增强不平衡医学数据集的分类能力","authors":"Ayushi Gupta, Shikha Gupta","doi":"10.54392/irjmt2435","DOIUrl":null,"url":null,"abstract":"Addressing the class imbalance in classification problems is particularly challenging, especially in the context of medical datasets where misclassifying minority class samples can have significant repercussions. This study is dedicated to mitigating class imbalance in medical datasets by employing a hybrid approach that combines data-level, cost-sensitive, and ensemble methods. Through an assessment of the performance, measured by AUC-ROC values, Sensitivity, F1-Score, and G-Mean of 20 data-level and four cost-sensitive models on seventeen medical datasets - 12 small and five large, a hybridized model, SMOTE-RF-CS-LR has been devised. This model integrates the Synthetic Minority Oversampling Technique (SMOTE), the ensemble classifier Random Forest (RF), and the Cost-Sensitive Logistic Regression (CS-LR). Upon testing the hybridized model on diverse imbalanced ratios, it demonstrated remarkable performance, achieving outstanding performance values on the majority of the datasets. Further examination of the model's training duration and time complexity revealed its efficiency, taking less than a second to train on each small dataset. Consequently, the proposed hybridized model not only proves to be time-efficient but also exhibits robust capabilities in handling class imbalance, yielding outstanding classification results in the context of medical datasets.","PeriodicalId":14412,"journal":{"name":"International Research Journal of Multidisciplinary Technovation","volume":"45 7","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhanced Classification of Imbalanced Medical Datasets using Hybrid Data-Level, Cost-Sensitive and Ensemble Methods\",\"authors\":\"Ayushi Gupta, Shikha Gupta\",\"doi\":\"10.54392/irjmt2435\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Addressing the class imbalance in classification problems is particularly challenging, especially in the context of medical datasets where misclassifying minority class samples can have significant repercussions. This study is dedicated to mitigating class imbalance in medical datasets by employing a hybrid approach that combines data-level, cost-sensitive, and ensemble methods. Through an assessment of the performance, measured by AUC-ROC values, Sensitivity, F1-Score, and G-Mean of 20 data-level and four cost-sensitive models on seventeen medical datasets - 12 small and five large, a hybridized model, SMOTE-RF-CS-LR has been devised. This model integrates the Synthetic Minority Oversampling Technique (SMOTE), the ensemble classifier Random Forest (RF), and the Cost-Sensitive Logistic Regression (CS-LR). Upon testing the hybridized model on diverse imbalanced ratios, it demonstrated remarkable performance, achieving outstanding performance values on the majority of the datasets. Further examination of the model's training duration and time complexity revealed its efficiency, taking less than a second to train on each small dataset. Consequently, the proposed hybridized model not only proves to be time-efficient but also exhibits robust capabilities in handling class imbalance, yielding outstanding classification results in the context of medical datasets.\",\"PeriodicalId\":14412,\"journal\":{\"name\":\"International Research Journal of Multidisciplinary Technovation\",\"volume\":\"45 7\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Research Journal of Multidisciplinary Technovation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.54392/irjmt2435\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Research Journal of Multidisciplinary Technovation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.54392/irjmt2435","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

解决分类问题中的类不平衡问题尤其具有挑战性,特别是在医疗数据集中,误分类少数类样本可能会产生重大影响。本研究采用一种混合方法,将数据级方法、成本敏感方法和集合方法结合起来,致力于减轻医疗数据集中的类不平衡问题。通过评估 20 个数据级模型和 4 个成本敏感模型在 17 个医疗数据集(12 个小型数据集和 5 个大型数据集)上的 AUC-ROC 值、灵敏度、F1-分数和 G-Mean 的性能,设计出了一个混合模型 SMOTE-RF-CS-LR。该模型集成了合成少数群体过度采样技术(SMOTE)、集合分类器随机森林(RF)和成本敏感逻辑回归(CS-LR)。在对各种不平衡比率进行混合模型测试后,该模型表现出了卓越的性能,在大多数数据集上都取得了出色的性能值。对模型的训练时间和时间复杂度的进一步检查显示了它的效率,在每个小数据集上的训练时间都不到一秒钟。因此,所提出的混合模型不仅省时高效,而且在处理类不平衡方面表现出强大的能力,在医学数据集方面取得了出色的分类结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhanced Classification of Imbalanced Medical Datasets using Hybrid Data-Level, Cost-Sensitive and Ensemble Methods
Addressing the class imbalance in classification problems is particularly challenging, especially in the context of medical datasets where misclassifying minority class samples can have significant repercussions. This study is dedicated to mitigating class imbalance in medical datasets by employing a hybrid approach that combines data-level, cost-sensitive, and ensemble methods. Through an assessment of the performance, measured by AUC-ROC values, Sensitivity, F1-Score, and G-Mean of 20 data-level and four cost-sensitive models on seventeen medical datasets - 12 small and five large, a hybridized model, SMOTE-RF-CS-LR has been devised. This model integrates the Synthetic Minority Oversampling Technique (SMOTE), the ensemble classifier Random Forest (RF), and the Cost-Sensitive Logistic Regression (CS-LR). Upon testing the hybridized model on diverse imbalanced ratios, it demonstrated remarkable performance, achieving outstanding performance values on the majority of the datasets. Further examination of the model's training duration and time complexity revealed its efficiency, taking less than a second to train on each small dataset. Consequently, the proposed hybridized model not only proves to be time-efficient but also exhibits robust capabilities in handling class imbalance, yielding outstanding classification results in the context of medical datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
0.50
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信