Ömer Çağrı Yavuz, M. H. Calp, Hazel Ceren Erkengel
{"title":"Prediction of breast cancer using machine learning algorithms on different datasets","authors":"Ömer Çağrı Yavuz, M. H. Calp, Hazel Ceren Erkengel","doi":"10.16925/2357-6014.2023.01.08","DOIUrl":null,"url":null,"abstract":"Breast cancer is a disease that is becoming more and more common day by day, causing emotional and behavioral reactions and having fatal consequences if not detected early. At this point, traditional methods are insufficient, especially in early diagnosis. In this context, this study aimed to predict breast cancer by using machine learning (ML) algorithms on different datasets and to demonstrate the applicability of these algorithms. Algorithm performances were compared on balanced and unbalanced datasets, taking into account the performance metrics obtained in applications on different datasets. In addition, a model based on the Borda Voting method was developed by including the results obtained from four different algorithms (NB, KNN, DT, and RF) in the process. The prediction values obtained from each algorithm were written in different columns on the same excel file and the most repetitive value was accepted as the final result value. The developed model was tested on real data consisting of 60 records and the results were analyzed. When the results were examined, it was seen that higher performance was obtained with the proposed RF model compared to similar studies in the literature. Finally, the prediction results obtained with the developed model revealed the applicability of ML algorithms in the diagnosis of breast cancer.","PeriodicalId":41023,"journal":{"name":"Ingenieria Solidaria","volume":null,"pages":null},"PeriodicalIF":0.4000,"publicationDate":"2023-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Ingenieria Solidaria","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.16925/2357-6014.2023.01.08","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ENGINEERING, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
Breast cancer is a disease that is becoming more and more common day by day, causing emotional and behavioral reactions and having fatal consequences if not detected early. At this point, traditional methods are insufficient, especially in early diagnosis. In this context, this study aimed to predict breast cancer by using machine learning (ML) algorithms on different datasets and to demonstrate the applicability of these algorithms. Algorithm performances were compared on balanced and unbalanced datasets, taking into account the performance metrics obtained in applications on different datasets. In addition, a model based on the Borda Voting method was developed by including the results obtained from four different algorithms (NB, KNN, DT, and RF) in the process. The prediction values obtained from each algorithm were written in different columns on the same excel file and the most repetitive value was accepted as the final result value. The developed model was tested on real data consisting of 60 records and the results were analyzed. When the results were examined, it was seen that higher performance was obtained with the proposed RF model compared to similar studies in the literature. Finally, the prediction results obtained with the developed model revealed the applicability of ML algorithms in the diagnosis of breast cancer.