Seyed Matin Malakouti, Mohammad Bagher Menhaj, Amir Abolfazl Suratgar
{"title":"ML: Early Breast Cancer Diagnosis","authors":"Seyed Matin Malakouti, Mohammad Bagher Menhaj, Amir Abolfazl Suratgar","doi":"10.1016/j.cpccr.2024.100278","DOIUrl":null,"url":null,"abstract":"<div><p>Breast cancer is the most common malignancy among women worldwide, often characterized by the uncontrolled proliferation of breast cells, leading to the formation of lumps or tumors that can be detected through medical imaging such as X-rays. Distinguishing between benign and malignant tumors presents a significant challenge in the diagnosis of breast cancer.</p><p>In this study, machine learning methods, including Logistic Regression, Gradient Boosting, Ada Boost, Random Forest, and Gaussian NB with Grid Search, were employed to differentiate between healthy individuals and those with malignancies. The results revealed that the Random Forest algorithm exhibited the highest performance in predicting breast cancer, accurately identifying 99 % of both healthy and affected individuals. Additionally, both Gradient Boosting and Ada Boost demonstrated a similar level of accuracy, correctly distinguishing 98 % of healthy and affected individuals.</p><p>Conversely, Gaussian NB performed the least effectively, with an accuracy of 91 % in differentiating between healthy and affected individuals, highlighting its comparatively lower predictive capability for breast cancer.</p></div>","PeriodicalId":72741,"journal":{"name":"Current problems in cancer. Case reports","volume":null,"pages":null},"PeriodicalIF":0.2000,"publicationDate":"2024-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2666621924000012/pdfft?md5=ceb405394533ced110a3290c7dbc4ff6&pid=1-s2.0-S2666621924000012-main.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Current problems in cancer. Case reports","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2666621924000012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Breast cancer is the most common malignancy among women worldwide, often characterized by the uncontrolled proliferation of breast cells, leading to the formation of lumps or tumors that can be detected through medical imaging such as X-rays. Distinguishing between benign and malignant tumors presents a significant challenge in the diagnosis of breast cancer.
In this study, machine learning methods, including Logistic Regression, Gradient Boosting, Ada Boost, Random Forest, and Gaussian NB with Grid Search, were employed to differentiate between healthy individuals and those with malignancies. The results revealed that the Random Forest algorithm exhibited the highest performance in predicting breast cancer, accurately identifying 99 % of both healthy and affected individuals. Additionally, both Gradient Boosting and Ada Boost demonstrated a similar level of accuracy, correctly distinguishing 98 % of healthy and affected individuals.
Conversely, Gaussian NB performed the least effectively, with an accuracy of 91 % in differentiating between healthy and affected individuals, highlighting its comparatively lower predictive capability for breast cancer.
乳腺癌是全球妇女中最常见的恶性肿瘤,其特征通常是乳腺细胞不受控制地增殖,从而形成肿块或肿瘤,这些肿块或肿瘤可通过 X 射线等医学影像检查出来。本研究采用了包括逻辑回归、梯度提升、Ada Boost、随机森林和网格搜索高斯 NB 在内的机器学习方法来区分健康人和恶性肿瘤患者。结果显示,随机森林算法在预测乳腺癌方面表现最佳,能准确识别 99% 的健康人和患病者。此外,梯度提升算法和 Ada Boost 算法也表现出了类似的准确率,能正确区分 98% 的健康人和患病者。相反,高斯 NB 算法的效果最差,在区分健康人和患病者方面的准确率仅为 91%,这突出表明该算法对乳腺癌的预测能力相对较低。