使用WEKA进行乳腺癌诊断的机器学习技术比较分析

2022 25th International Conference on Computer and Information Technology (ICCIT) Pub Date : 2022-12-17 DOI:10.1109/ICCIT57492.2022.10055421

Afrah Rashid, Syeda Sohana Binta Farhad, Afsana Bhuyian, N. Yeasmin, Mohammad Abdul Azim, Z. Alom

{"title":"使用WEKA进行乳腺癌诊断的机器学习技术比较分析","authors":"Afrah Rashid, Syeda Sohana Binta Farhad, Afsana Bhuyian, N. Yeasmin, Mohammad Abdul Azim, Z. Alom","doi":"10.1109/ICCIT57492.2022.10055421","DOIUrl":null,"url":null,"abstract":"Breast cancer is one of the most common malignancies affecting women worldwide, with many fatalities yearly. The risk of death suffered by breast cancer is increasing exponentially. Due to a surge of development of research in the medical field, providing more timely and possible early detection of disease has become a time-demanding option. By far, radiologists have manually checked cancer images and diagnosed them. Research has shown that a considerable number of ultrasound images are created every individual day. However, the number of radiologists is limited, so they cannot provide service on time. However, they often misclassify breast lesions, resulting in a high false-positive rate. An automatic system for detecting disease assists radiologists in disease diagnosis and provides reliable, productive, and reduces the risk of death. In this paper, we compare six machine learning models, namely (i) Support Vector Machine (SVM), (ii) Naive Bayes (NB), (iii) Logistic Regression (LR), (iv) Decision Tree (DT), (v) Random Forest (RF), and (vi) k-Nearest Neighbors (k-NN) on two different datasets (i) the Wisconsin Breast Cancer Dataset (WBCD) and (ii) the Breast Cancer Coimbra Dataset (BCCD). This study aims to create different classification models to analyze the obtained results and compare them to predict breast cancer. We use several performance metrics to select the best classification model among them. Our comparative analysis shows that SVM models can achieve better performance metrics, and thus the model of this research possesses relevant to use in clinical applications.","PeriodicalId":255498,"journal":{"name":"2022 25th International Conference on Computer and Information Technology (ICCIT)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Comparative Analysis of Machine Learning techniques on Breast Cancer diagnosis using WEKA\",\"authors\":\"Afrah Rashid, Syeda Sohana Binta Farhad, Afsana Bhuyian, N. Yeasmin, Mohammad Abdul Azim, Z. Alom\",\"doi\":\"10.1109/ICCIT57492.2022.10055421\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer is one of the most common malignancies affecting women worldwide, with many fatalities yearly. The risk of death suffered by breast cancer is increasing exponentially. Due to a surge of development of research in the medical field, providing more timely and possible early detection of disease has become a time-demanding option. By far, radiologists have manually checked cancer images and diagnosed them. Research has shown that a considerable number of ultrasound images are created every individual day. However, the number of radiologists is limited, so they cannot provide service on time. However, they often misclassify breast lesions, resulting in a high false-positive rate. An automatic system for detecting disease assists radiologists in disease diagnosis and provides reliable, productive, and reduces the risk of death. In this paper, we compare six machine learning models, namely (i) Support Vector Machine (SVM), (ii) Naive Bayes (NB), (iii) Logistic Regression (LR), (iv) Decision Tree (DT), (v) Random Forest (RF), and (vi) k-Nearest Neighbors (k-NN) on two different datasets (i) the Wisconsin Breast Cancer Dataset (WBCD) and (ii) the Breast Cancer Coimbra Dataset (BCCD). This study aims to create different classification models to analyze the obtained results and compare them to predict breast cancer. We use several performance metrics to select the best classification model among them. Our comparative analysis shows that SVM models can achieve better performance metrics, and thus the model of this research possesses relevant to use in clinical applications.\",\"PeriodicalId\":255498,\"journal\":{\"name\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 25th International Conference on Computer and Information Technology (ICCIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIT57492.2022.10055421\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th International Conference on Computer and Information Technology (ICCIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIT57492.2022.10055421","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

乳腺癌是影响全世界妇女的最常见的恶性肿瘤之一，每年有许多人死亡。乳腺癌导致的死亡风险呈指数增长。由于医学领域研究的迅猛发展，提供更及时和可能的早期疾病检测已成为一项耗时的选择。到目前为止，放射科医生已经手动检查了癌症图像并进行了诊断。研究表明，每天都会产生相当数量的超声波图像。然而，放射科医生的数量有限，因此他们不能按时提供服务。然而，他们经常误诊乳腺病变，导致高假阳性率。用于检测疾病的自动系统协助放射科医生进行疾病诊断，并提供可靠、高效和降低死亡风险的服务。在本文中，我们比较了六种机器学习模型，即(i)支持向量机(SVM)， (ii)朴素贝叶斯(NB)， (iii)逻辑回归(LR)， (iv)决策树(DT)， (v)随机森林(RF)和(vi) k-近邻(k-NN)在两个不同数据集上(i)威斯康星州乳腺癌数据集(WBCD)和(ii)科英布拉乳腺癌数据集(BCCD)。本研究旨在建立不同的分类模型，对得到的结果进行分析和比较，以预测乳腺癌。我们使用几个性能指标从中选择最佳的分类模型。我们的对比分析表明，SVM模型可以获得更好的性能指标，因此本研究的模型具有临床应用的相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Comparative Analysis of Machine Learning techniques on Breast Cancer diagnosis using WEKA

Breast cancer is one of the most common malignancies affecting women worldwide, with many fatalities yearly. The risk of death suffered by breast cancer is increasing exponentially. Due to a surge of development of research in the medical field, providing more timely and possible early detection of disease has become a time-demanding option. By far, radiologists have manually checked cancer images and diagnosed them. Research has shown that a considerable number of ultrasound images are created every individual day. However, the number of radiologists is limited, so they cannot provide service on time. However, they often misclassify breast lesions, resulting in a high false-positive rate. An automatic system for detecting disease assists radiologists in disease diagnosis and provides reliable, productive, and reduces the risk of death. In this paper, we compare six machine learning models, namely (i) Support Vector Machine (SVM), (ii) Naive Bayes (NB), (iii) Logistic Regression (LR), (iv) Decision Tree (DT), (v) Random Forest (RF), and (vi) k-Nearest Neighbors (k-NN) on two different datasets (i) the Wisconsin Breast Cancer Dataset (WBCD) and (ii) the Breast Cancer Coimbra Dataset (BCCD). This study aims to create different classification models to analyze the obtained results and compare them to predict breast cancer. We use several performance metrics to select the best classification model among them. Our comparative analysis shows that SVM models can achieve better performance metrics, and thus the model of this research possesses relevant to use in clinical applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 25th International Conference on Computer and Information Technology (ICCIT)

自引率

0.00%

发文量