Unlocking the Potential of Machine Learning for Accurate Diagnosis of Breast Cancer

Rinku Soni, Saeedah Zaina, Dr.Y.L Malathi Latha
{"title":"Unlocking the Potential of Machine Learning for Accurate Diagnosis of Breast Cancer","authors":"Rinku Soni, Saeedah Zaina, Dr.Y.L Malathi Latha","doi":"10.1109/CONIT59222.2023.10205897","DOIUrl":null,"url":null,"abstract":"Breast cancer is a major health concern affecting women globally, and early detection is crucial for successful treatment. A promising strategy for enhancing breast cancer diagnosis accuracy and lowering diagnostic mistakes is machine learning. This research aims to enhance the accuracy of breast cancer diagnosis by utilizing balanced data and comparing different machine learning algorithms for classification with and without the use of feature selection methods. In this study, we utilized the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, and to balance data both oversampling and undersampling techniques were utilized. We have used eight different classification models and five different feature selection techniques. We compared the performance of classifiers over undersampled and oversampled data, with and without feature selection. MLflow was utilized to monitor the effectiveness of algorithms and keep a record of their performance. Our results show that oversampling was more effective in improving the performance of our models compared to undersampling. When compared to other models, Logistic Regression achieved the highest accuracy on the oversampled data without feature selection. Our research showed that incorporating feature selection results in slightly lower accuracy compared to the base model which means that the results were not significant enough to compensate for the information loss caused by removing certain features. The study underscores the efficacy of machine learning in the diagnosis of breast cancer and draws attention to the potential of machine learning algorithms in enhancing the accuracy of cancer detection.","PeriodicalId":377623,"journal":{"name":"2023 3rd International Conference on Intelligent Technologies (CONIT)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 3rd International Conference on Intelligent Technologies (CONIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONIT59222.2023.10205897","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Breast cancer is a major health concern affecting women globally, and early detection is crucial for successful treatment. A promising strategy for enhancing breast cancer diagnosis accuracy and lowering diagnostic mistakes is machine learning. This research aims to enhance the accuracy of breast cancer diagnosis by utilizing balanced data and comparing different machine learning algorithms for classification with and without the use of feature selection methods. In this study, we utilized the Wisconsin Diagnostic Breast Cancer (WDBC) dataset, and to balance data both oversampling and undersampling techniques were utilized. We have used eight different classification models and five different feature selection techniques. We compared the performance of classifiers over undersampled and oversampled data, with and without feature selection. MLflow was utilized to monitor the effectiveness of algorithms and keep a record of their performance. Our results show that oversampling was more effective in improving the performance of our models compared to undersampling. When compared to other models, Logistic Regression achieved the highest accuracy on the oversampled data without feature selection. Our research showed that incorporating feature selection results in slightly lower accuracy compared to the base model which means that the results were not significant enough to compensate for the information loss caused by removing certain features. The study underscores the efficacy of machine learning in the diagnosis of breast cancer and draws attention to the potential of machine learning algorithms in enhancing the accuracy of cancer detection.
释放机器学习对乳腺癌准确诊断的潜力
乳腺癌是影响全球妇女的主要健康问题,早期发现对成功治疗至关重要。提高乳腺癌诊断准确性和降低诊断错误的一个有前途的策略是机器学习。本研究旨在通过利用平衡数据,比较不同的机器学习分类算法在使用和不使用特征选择方法的情况下,提高乳腺癌诊断的准确性。在这项研究中,我们利用了威斯康星州诊断乳腺癌(WDBC)数据集,并利用过采样和欠采样技术来平衡数据。我们使用了八种不同的分类模型和五种不同的特征选择技术。我们比较了分类器在欠采样和过采样数据上的性能,有和没有特征选择。MLflow用于监控算法的有效性并记录其性能。我们的结果表明,与欠采样相比,过采样在提高模型性能方面更有效。与其他模型相比,逻辑回归在没有特征选择的过采样数据上取得了最高的准确性。我们的研究表明,与基本模型相比,结合特征选择的结果精度略低,这意味着结果不足以补偿因删除某些特征而造成的信息损失。该研究强调了机器学习在乳腺癌诊断中的功效,并引起了人们对机器学习算法在提高癌症检测准确性方面的潜力的关注。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信