A comparative study on breast cancer classification with stratified shuffle split and K-fold cross validation via ensembled machine learning

IF 1.7 4区综合性期刊 Q2 MULTIDISCIPLINARY SCIENCES

Journal of Radiation Research and Applied Sciences Pub Date : 2024-08-22 DOI:10.1016/j.jrras.2024.101080

Serhat Ünalan , Osman Günay , Iskender Akkurt , Kadir Gunoglu , H.O. Tekin

{"title":"A comparative study on breast cancer classification with stratified shuffle split and K-fold cross validation via ensembled machine learning","authors":"Serhat Ünalan , Osman Günay , Iskender Akkurt , Kadir Gunoglu , H.O. Tekin","doi":"10.1016/j.jrras.2024.101080","DOIUrl":null,"url":null,"abstract":"<div><p>In breast cancer, early diagnosis and treatment method hold paramount significance for the augmented survival rates. Through a comprehensive dataset including clinical and genomic information, this study assesses the diverse analytical techniques used in breast cancer classification by the employment of four different machine learning algorithms. There were notable differences in classification findings, emphasizing the necessity of using adept analytical tools to improve the accuracy of breast cancer classification. Among individual algorithms, LGBM has the highest F1 score of 99.2% and a remarkable accuracy of 98.9%. Ensembles comprising AdaBoost, GBM, and RGF outperformed individual techniques with an astonishing 99.5% accuracy. The best ensemble algorithms prioritize features like worst texture, worst concave points, mean concave points, and mean texture, crucial for the classification. The examination of the advantages of ensemble learning methods, which combine predictions from many classifiers to improve classification performance, is at the heart of this the study. In particular, it is revealed how the k-fold and stratified shuffle split cross-validation methods differ in the classification results, providing clinicians a thorough understanding of the clinical ramifications to decipher the complex facets of breast cancer classification and identify crucial tumor traits that can distinguish malignant from benign cases.</p></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"17 4","pages":"Article 101080"},"PeriodicalIF":1.7000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1687850724002644/pdfft?md5=5d246ed2ca612e20f6afe6c2591456c5&pid=1-s2.0-S1687850724002644-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850724002644","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}

引用次数: 0

Abstract

In breast cancer, early diagnosis and treatment method hold paramount significance for the augmented survival rates. Through a comprehensive dataset including clinical and genomic information, this study assesses the diverse analytical techniques used in breast cancer classification by the employment of four different machine learning algorithms. There were notable differences in classification findings, emphasizing the necessity of using adept analytical tools to improve the accuracy of breast cancer classification. Among individual algorithms, LGBM has the highest F1 score of 99.2% and a remarkable accuracy of 98.9%. Ensembles comprising AdaBoost, GBM, and RGF outperformed individual techniques with an astonishing 99.5% accuracy. The best ensemble algorithms prioritize features like worst texture, worst concave points, mean concave points, and mean texture, crucial for the classification. The examination of the advantages of ensemble learning methods, which combine predictions from many classifiers to improve classification performance, is at the heart of this the study. In particular, it is revealed how the k-fold and stratified shuffle split cross-validation methods differ in the classification results, providing clinicians a thorough understanding of the clinical ramifications to decipher the complex facets of breast cancer classification and identify crucial tumor traits that can distinguish malignant from benign cases.

查看原文本刊更多论文

通过集合机器学习对分层洗牌分割和 K 折交叉验证进行乳腺癌分类的比较研究

对于乳腺癌患者来说，早期诊断和治疗方法对于提高生存率至关重要。本研究通过一个包含临床和基因组信息的综合数据集，采用四种不同的机器学习算法，对乳腺癌分类中使用的各种分析技术进行了评估。分类结果存在明显差异，这强调了使用专业分析工具提高乳腺癌分类准确性的必要性。在单个算法中，LGBM 的 F1 分数最高，达到 99.2%，准确率也高达 98.9%。由 AdaBoost、GBM 和 RGF 组成的集合算法的准确率达到了惊人的 99.5%，超过了单项技术。最佳集合算法优先考虑最差纹理、最差凹点、平均凹点和平均纹理等对分类至关重要的特征。本研究的核心是研究集合学习方法的优势，这种方法结合了许多分类器的预测结果，从而提高了分类性能。特别是，研究揭示了 k-fold 和分层洗牌分裂交叉验证方法在分类结果上的差异，使临床医生对临床影响有了透彻的了解，从而破译乳腺癌分类的复杂面，并找出可以区分恶性和良性病例的关键肿瘤特征。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Radiation Research and Applied Sciences MULTIDISCIPLINARY SCIENCES-

自引率

5.90%

发文量

130

审稿时长

16 weeks

期刊介绍： Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.