A comparative study on breast cancer classification with stratified shuffle split and K-fold cross validation via ensembled machine learning

IF 1.7 4区 综合性期刊 Q2 MULTIDISCIPLINARY SCIENCES
Serhat Ünalan , Osman Günay , Iskender Akkurt , Kadir Gunoglu , H.O. Tekin
{"title":"A comparative study on breast cancer classification with stratified shuffle split and K-fold cross validation via ensembled machine learning","authors":"Serhat Ünalan ,&nbsp;Osman Günay ,&nbsp;Iskender Akkurt ,&nbsp;Kadir Gunoglu ,&nbsp;H.O. Tekin","doi":"10.1016/j.jrras.2024.101080","DOIUrl":null,"url":null,"abstract":"<div><p>In breast cancer, early diagnosis and treatment method hold paramount significance for the augmented survival rates. Through a comprehensive dataset including clinical and genomic information, this study assesses the diverse analytical techniques used in breast cancer classification by the employment of four different machine learning algorithms. There were notable differences in classification findings, emphasizing the necessity of using adept analytical tools to improve the accuracy of breast cancer classification. Among individual algorithms, LGBM has the highest F1 score of 99.2% and a remarkable accuracy of 98.9%. Ensembles comprising AdaBoost, GBM, and RGF outperformed individual techniques with an astonishing 99.5% accuracy. The best ensemble algorithms prioritize features like worst texture, worst concave points, mean concave points, and mean texture, crucial for the classification. The examination of the advantages of ensemble learning methods, which combine predictions from many classifiers to improve classification performance, is at the heart of this the study. In particular, it is revealed how the k-fold and stratified shuffle split cross-validation methods differ in the classification results, providing clinicians a thorough understanding of the clinical ramifications to decipher the complex facets of breast cancer classification and identify crucial tumor traits that can distinguish malignant from benign cases.</p></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"17 4","pages":"Article 101080"},"PeriodicalIF":1.7000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1687850724002644/pdfft?md5=5d246ed2ca612e20f6afe6c2591456c5&pid=1-s2.0-S1687850724002644-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850724002644","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
引用次数: 0

Abstract

In breast cancer, early diagnosis and treatment method hold paramount significance for the augmented survival rates. Through a comprehensive dataset including clinical and genomic information, this study assesses the diverse analytical techniques used in breast cancer classification by the employment of four different machine learning algorithms. There were notable differences in classification findings, emphasizing the necessity of using adept analytical tools to improve the accuracy of breast cancer classification. Among individual algorithms, LGBM has the highest F1 score of 99.2% and a remarkable accuracy of 98.9%. Ensembles comprising AdaBoost, GBM, and RGF outperformed individual techniques with an astonishing 99.5% accuracy. The best ensemble algorithms prioritize features like worst texture, worst concave points, mean concave points, and mean texture, crucial for the classification. The examination of the advantages of ensemble learning methods, which combine predictions from many classifiers to improve classification performance, is at the heart of this the study. In particular, it is revealed how the k-fold and stratified shuffle split cross-validation methods differ in the classification results, providing clinicians a thorough understanding of the clinical ramifications to decipher the complex facets of breast cancer classification and identify crucial tumor traits that can distinguish malignant from benign cases.

通过集合机器学习对分层洗牌分割和 K 折交叉验证进行乳腺癌分类的比较研究
对于乳腺癌患者来说,早期诊断和治疗方法对于提高生存率至关重要。本研究通过一个包含临床和基因组信息的综合数据集,采用四种不同的机器学习算法,对乳腺癌分类中使用的各种分析技术进行了评估。分类结果存在明显差异,这强调了使用专业分析工具提高乳腺癌分类准确性的必要性。在单个算法中,LGBM 的 F1 分数最高,达到 99.2%,准确率也高达 98.9%。由 AdaBoost、GBM 和 RGF 组成的集合算法的准确率达到了惊人的 99.5%,超过了单项技术。最佳集合算法优先考虑最差纹理、最差凹点、平均凹点和平均纹理等对分类至关重要的特征。本研究的核心是研究集合学习方法的优势,这种方法结合了许多分类器的预测结果,从而提高了分类性能。特别是,研究揭示了 k-fold 和分层洗牌分裂交叉验证方法在分类结果上的差异,使临床医生对临床影响有了透彻的了解,从而破译乳腺癌分类的复杂面,并找出可以区分恶性和良性病例的关键肿瘤特征。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
5.90%
发文量
130
审稿时长
16 weeks
期刊介绍: Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信