Serhat Ünalan , Osman Günay , Iskender Akkurt , Kadir Gunoglu , H.O. Tekin
{"title":"通过集合机器学习对分层洗牌分割和 K 折交叉验证进行乳腺癌分类的比较研究","authors":"Serhat Ünalan , Osman Günay , Iskender Akkurt , Kadir Gunoglu , H.O. Tekin","doi":"10.1016/j.jrras.2024.101080","DOIUrl":null,"url":null,"abstract":"<div><p>In breast cancer, early diagnosis and treatment method hold paramount significance for the augmented survival rates. Through a comprehensive dataset including clinical and genomic information, this study assesses the diverse analytical techniques used in breast cancer classification by the employment of four different machine learning algorithms. There were notable differences in classification findings, emphasizing the necessity of using adept analytical tools to improve the accuracy of breast cancer classification. Among individual algorithms, LGBM has the highest F1 score of 99.2% and a remarkable accuracy of 98.9%. Ensembles comprising AdaBoost, GBM, and RGF outperformed individual techniques with an astonishing 99.5% accuracy. The best ensemble algorithms prioritize features like worst texture, worst concave points, mean concave points, and mean texture, crucial for the classification. The examination of the advantages of ensemble learning methods, which combine predictions from many classifiers to improve classification performance, is at the heart of this the study. In particular, it is revealed how the k-fold and stratified shuffle split cross-validation methods differ in the classification results, providing clinicians a thorough understanding of the clinical ramifications to decipher the complex facets of breast cancer classification and identify crucial tumor traits that can distinguish malignant from benign cases.</p></div>","PeriodicalId":16920,"journal":{"name":"Journal of Radiation Research and Applied Sciences","volume":"17 4","pages":"Article 101080"},"PeriodicalIF":1.7000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S1687850724002644/pdfft?md5=5d246ed2ca612e20f6afe6c2591456c5&pid=1-s2.0-S1687850724002644-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A comparative study on breast cancer classification with stratified shuffle split and K-fold cross validation via ensembled machine learning\",\"authors\":\"Serhat Ünalan , Osman Günay , Iskender Akkurt , Kadir Gunoglu , H.O. Tekin\",\"doi\":\"10.1016/j.jrras.2024.101080\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>In breast cancer, early diagnosis and treatment method hold paramount significance for the augmented survival rates. Through a comprehensive dataset including clinical and genomic information, this study assesses the diverse analytical techniques used in breast cancer classification by the employment of four different machine learning algorithms. There were notable differences in classification findings, emphasizing the necessity of using adept analytical tools to improve the accuracy of breast cancer classification. Among individual algorithms, LGBM has the highest F1 score of 99.2% and a remarkable accuracy of 98.9%. Ensembles comprising AdaBoost, GBM, and RGF outperformed individual techniques with an astonishing 99.5% accuracy. The best ensemble algorithms prioritize features like worst texture, worst concave points, mean concave points, and mean texture, crucial for the classification. The examination of the advantages of ensemble learning methods, which combine predictions from many classifiers to improve classification performance, is at the heart of this the study. In particular, it is revealed how the k-fold and stratified shuffle split cross-validation methods differ in the classification results, providing clinicians a thorough understanding of the clinical ramifications to decipher the complex facets of breast cancer classification and identify crucial tumor traits that can distinguish malignant from benign cases.</p></div>\",\"PeriodicalId\":16920,\"journal\":{\"name\":\"Journal of Radiation Research and Applied Sciences\",\"volume\":\"17 4\",\"pages\":\"Article 101080\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2024-08-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S1687850724002644/pdfft?md5=5d246ed2ca612e20f6afe6c2591456c5&pid=1-s2.0-S1687850724002644-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Radiation Research and Applied Sciences\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1687850724002644\",\"RegionNum\":4,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MULTIDISCIPLINARY SCIENCES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Radiation Research and Applied Sciences","FirstCategoryId":"103","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1687850724002644","RegionNum":4,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MULTIDISCIPLINARY SCIENCES","Score":null,"Total":0}
A comparative study on breast cancer classification with stratified shuffle split and K-fold cross validation via ensembled machine learning
In breast cancer, early diagnosis and treatment method hold paramount significance for the augmented survival rates. Through a comprehensive dataset including clinical and genomic information, this study assesses the diverse analytical techniques used in breast cancer classification by the employment of four different machine learning algorithms. There were notable differences in classification findings, emphasizing the necessity of using adept analytical tools to improve the accuracy of breast cancer classification. Among individual algorithms, LGBM has the highest F1 score of 99.2% and a remarkable accuracy of 98.9%. Ensembles comprising AdaBoost, GBM, and RGF outperformed individual techniques with an astonishing 99.5% accuracy. The best ensemble algorithms prioritize features like worst texture, worst concave points, mean concave points, and mean texture, crucial for the classification. The examination of the advantages of ensemble learning methods, which combine predictions from many classifiers to improve classification performance, is at the heart of this the study. In particular, it is revealed how the k-fold and stratified shuffle split cross-validation methods differ in the classification results, providing clinicians a thorough understanding of the clinical ramifications to decipher the complex facets of breast cancer classification and identify crucial tumor traits that can distinguish malignant from benign cases.
期刊介绍:
Journal of Radiation Research and Applied Sciences provides a high quality medium for the publication of substantial, original and scientific and technological papers on the development and applications of nuclear, radiation and isotopes in biology, medicine, drugs, biochemistry, microbiology, agriculture, entomology, food technology, chemistry, physics, solid states, engineering, environmental and applied sciences.