Prediction of breast cancer using machine learning algorithms on different datasets

IF 0.4 Q4 ENGINEERING, MULTIDISCIPLINARY
Ömer Çağrı Yavuz, M. H. Calp, Hazel Ceren Erkengel
{"title":"Prediction of breast cancer using machine learning algorithms on different datasets","authors":"Ömer Çağrı Yavuz, M. H. Calp, Hazel Ceren Erkengel","doi":"10.16925/2357-6014.2023.01.08","DOIUrl":null,"url":null,"abstract":"Breast cancer is a disease that is becoming more and more common day by day, causing emotional and behavioral reactions and having fatal consequences if not detected early. At this point, traditional methods are insufficient, especially in early diagnosis. In this context, this study aimed to predict breast cancer by using machine learning (ML) algorithms on different datasets and to demonstrate the applicability of these algorithms. Algorithm performances were compared on balanced and unbalanced datasets, taking into account the performance metrics obtained in applications on different datasets. In addition, a model based on the Borda Voting method was developed by including the results obtained from four different algorithms (NB, KNN, DT, and RF) in the process. The prediction values obtained from each algorithm were written in different columns on the same excel file and the most repetitive value was accepted as the final result value. The developed model was tested on real data consisting of 60 records and the results were analyzed. When the results were examined, it was seen that higher performance was obtained with the proposed RF model compared to similar studies in the literature. Finally, the prediction results obtained with the developed model revealed the applicability of ML algorithms in the diagnosis of breast cancer.","PeriodicalId":41023,"journal":{"name":"Ingenieria Solidaria","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ingenieria Solidaria","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16925/2357-6014.2023.01.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Breast cancer is a disease that is becoming more and more common day by day, causing emotional and behavioral reactions and having fatal consequences if not detected early. At this point, traditional methods are insufficient, especially in early diagnosis. In this context, this study aimed to predict breast cancer by using machine learning (ML) algorithms on different datasets and to demonstrate the applicability of these algorithms. Algorithm performances were compared on balanced and unbalanced datasets, taking into account the performance metrics obtained in applications on different datasets. In addition, a model based on the Borda Voting method was developed by including the results obtained from four different algorithms (NB, KNN, DT, and RF) in the process. The prediction values obtained from each algorithm were written in different columns on the same excel file and the most repetitive value was accepted as the final result value. The developed model was tested on real data consisting of 60 records and the results were analyzed. When the results were examined, it was seen that higher performance was obtained with the proposed RF model compared to similar studies in the literature. Finally, the prediction results obtained with the developed model revealed the applicability of ML algorithms in the diagnosis of breast cancer.
在不同数据集上使用机器学习算法预测乳腺癌
乳腺癌是一种日益常见的疾病,如果不及早发现,会引起情绪和行为反应,并造成致命后果。在这一点上,传统的方法是不够的,特别是在早期诊断。在此背景下,本研究旨在通过在不同数据集上使用机器学习(ML)算法来预测乳腺癌,并证明这些算法的适用性。考虑在不同数据集上的应用所获得的性能指标,比较了算法在平衡数据集和不平衡数据集上的性能。此外,通过将NB、KNN、DT和RF四种不同算法的结果纳入该过程,建立了基于Borda投票法的模型。每个算法得到的预测值分别写在同一个excel文件的不同列中,重复次数最多的值被接受为最终结果值。在60条记录的实际数据上对所建立的模型进行了检验,并对结果进行了分析。当对结果进行检验时,可以看到与文献中的类似研究相比,所提出的射频模型获得了更高的性能。最后,利用所建立的模型获得的预测结果揭示了ML算法在乳腺癌诊断中的适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Ingenieria Solidaria
Ingenieria Solidaria ENGINEERING, MULTIDISCIPLINARY-
自引率
0.00%
发文量
10
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信