Effective utilization of Machine Learning Techniques to Classify Breast Cancer Tumors

2022 IEEE Pune Section International Conference (PuneCon) Pub Date : 2022-12-15 DOI:10.1109/PuneCon55413.2022.10014940

Gauri Kamath, A. Phadke

{"title":"Effective utilization of Machine Learning Techniques to Classify Breast Cancer Tumors","authors":"Gauri Kamath, A. Phadke","doi":"10.1109/PuneCon55413.2022.10014940","DOIUrl":null,"url":null,"abstract":"Breast Cancer occurs when alterations called mutations to take place in the genes that cause anomalous cell advancement in the breast. One of the ways to achieve success in this field of cancer is by digging deep into machine learning techniques to diagnose the disease better as well as attempt to cure it. This paper aims at identifying breast cancer tumors fast and efficiently. The system suggested in the research uses the Wisconsin Breast Cancer Dataset, which was downloaded from the UCI repository, and allows binary classification, classifying tumors as malignant or benign. Techniques used to implement classification are Support Vector Machines and Random Forest. To comprehend the trends and patterns in the Wisconsin Breast Cancer Dataset, a thorough data visualization of the dataset has been conducted. The system employs data processing techniques to retrieve useful data, followed by Principal Component Analysis to carry out feature extraction. For SVM, to reiterate through the predefined hyperparameters, Grid Search CV has been implemented. For the Random Forest algorithm, k-fold cross-validation has been applied to achieve a unique set of results. The highest accuracy achieved using the random forest algorithm is 99.7% and the same for SVM is 98.2%. The following algorithms have been highlighted since their implementation has helped to retrieve significant accuracy levels. The models have been evaluated by computing the precision, recall score, f1 score, and confusion matrix. Models have also been compared using truepositive rate, true negative rate, false positive rate, and false negative rate.","PeriodicalId":258640,"journal":{"name":"2022 IEEE Pune Section International Conference (PuneCon)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Pune Section International Conference (PuneCon)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PuneCon55413.2022.10014940","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Breast Cancer occurs when alterations called mutations to take place in the genes that cause anomalous cell advancement in the breast. One of the ways to achieve success in this field of cancer is by digging deep into machine learning techniques to diagnose the disease better as well as attempt to cure it. This paper aims at identifying breast cancer tumors fast and efficiently. The system suggested in the research uses the Wisconsin Breast Cancer Dataset, which was downloaded from the UCI repository, and allows binary classification, classifying tumors as malignant or benign. Techniques used to implement classification are Support Vector Machines and Random Forest. To comprehend the trends and patterns in the Wisconsin Breast Cancer Dataset, a thorough data visualization of the dataset has been conducted. The system employs data processing techniques to retrieve useful data, followed by Principal Component Analysis to carry out feature extraction. For SVM, to reiterate through the predefined hyperparameters, Grid Search CV has been implemented. For the Random Forest algorithm, k-fold cross-validation has been applied to achieve a unique set of results. The highest accuracy achieved using the random forest algorithm is 99.7% and the same for SVM is 98.2%. The following algorithms have been highlighted since their implementation has helped to retrieve significant accuracy levels. The models have been evaluated by computing the precision, recall score, f1 score, and confusion matrix. Models have also been compared using truepositive rate, true negative rate, false positive rate, and false negative rate.

查看原文本刊更多论文

有效利用机器学习技术对乳腺癌肿瘤进行分类

乳腺癌发生的原因是基因发生了突变，导致乳腺细胞的异常进展。在癌症领域取得成功的方法之一是深入挖掘机器学习技术，以更好地诊断疾病并尝试治愈它。本文旨在快速有效地识别乳腺癌肿瘤。研究中提出的系统使用从UCI存储库下载的威斯康星州乳腺癌数据集，并允许二元分类，将肿瘤分为恶性或良性。用于实现分类的技术有支持向量机和随机森林。为了理解威斯康星乳腺癌数据集的趋势和模式，对数据集进行了彻底的数据可视化。该系统采用数据处理技术检索有用数据，然后通过主成分分析进行特征提取。对于支持向量机，通过预定义的超参数进行重复，实现了网格搜索CV。对于随机森林算法，已经应用k-fold交叉验证来获得一组唯一的结果。随机森林算法的最高准确率为99.7%，SVM的最高准确率为98.2%。下面的算法已经被强调，因为它们的实现有助于检索重要的准确性水平。通过计算精度、召回分数、f1分数和混淆矩阵来评估模型。模型也用真阳性率、真阴性率、假阳性率和假阴性率进行比较。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Pune Section International Conference (PuneCon)

自引率

0.00%

发文量