使用机器学习技术改进数据集的乳腺癌检测

J. Medical Imaging Health Informatics Pub Date : 2021-12-01 DOI:10.1166/jmihi.2021.3892

Sundarambal Balaraman, Ramesh Ramamoorthy, R. Krishnamoorthi

{"title":"使用机器学习技术改进数据集的乳腺癌检测","authors":"Sundarambal Balaraman, Ramesh Ramamoorthy, R. Krishnamoorthi","doi":"10.1166/jmihi.2021.3892","DOIUrl":null,"url":null,"abstract":"Machine learning is a current topic of interest in research and industry, with the implementation of novel strategies all the time. The main purpose of this research activity is to determine the efficiency of machine learning techniques in the detection research of breast cancer. The\n incidence and mortality of breast cancer in women are increasing day by day. Worldwide, researchers have worked hard to help clinicians provide the best model for detecting diagnosis and breast cancer. In this work, learning UCI machine Wisconsin breast cancer data from a set of databases,\n model, and analyze the performance of existing work use, compared to the same data set. The dataset is analyzed, and the revamped dataset is constructed by eliminating redundant features and appending new features essential for prediction. Logistic regression, K nearest neighbors (KNN), support\n vector machine (SVM), decision trees, random forest, XGBoost, using a machine learning algorithm, such as re-organized data set of artificial neural network AdaBoost, 8 one of prediction build the model application (ANN). Standard to analyze the accuracy rate. In the experiment, these classifications\n have been shown to work for breast cancer with >97% accuracy. Logistic regression, XGBoost and Adaboost, stand on top with 99.28 percent accuracy. The experiment also, the balanced data set of removal outliers and balance, shows that have a significant impact on the model’s prediction\n performance.","PeriodicalId":393031,"journal":{"name":"J. Medical Imaging Health Informatics","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Breast Cancer Detection with Revamped Dataset Using Machine Learning Techniques\",\"authors\":\"Sundarambal Balaraman, Ramesh Ramamoorthy, R. Krishnamoorthi\",\"doi\":\"10.1166/jmihi.2021.3892\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning is a current topic of interest in research and industry, with the implementation of novel strategies all the time. The main purpose of this research activity is to determine the efficiency of machine learning techniques in the detection research of breast cancer. The\\n incidence and mortality of breast cancer in women are increasing day by day. Worldwide, researchers have worked hard to help clinicians provide the best model for detecting diagnosis and breast cancer. In this work, learning UCI machine Wisconsin breast cancer data from a set of databases,\\n model, and analyze the performance of existing work use, compared to the same data set. The dataset is analyzed, and the revamped dataset is constructed by eliminating redundant features and appending new features essential for prediction. Logistic regression, K nearest neighbors (KNN), support\\n vector machine (SVM), decision trees, random forest, XGBoost, using a machine learning algorithm, such as re-organized data set of artificial neural network AdaBoost, 8 one of prediction build the model application (ANN). Standard to analyze the accuracy rate. In the experiment, these classifications\\n have been shown to work for breast cancer with >97% accuracy. Logistic regression, XGBoost and Adaboost, stand on top with 99.28 percent accuracy. The experiment also, the balanced data set of removal outliers and balance, shows that have a significant impact on the model’s prediction\\n performance.\",\"PeriodicalId\":393031,\"journal\":{\"name\":\"J. Medical Imaging Health Informatics\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Medical Imaging Health Informatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1166/jmihi.2021.3892\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Medical Imaging Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1166/jmihi.2021.3892","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

机器学习是当前研究和工业界感兴趣的话题，一直在实施新的策略。这项研究活动的主要目的是确定机器学习技术在乳腺癌检测研究中的效率。妇女乳腺癌的发病率和死亡率日益增加。在世界范围内，研究人员一直在努力帮助临床医生提供检测诊断和乳腺癌的最佳模型。在本工作中，UCI机器从一组数据库中学习威斯康星乳腺癌数据，建立模型，并分析现有工作使用的性能，对比相同的数据集。对数据集进行分析，剔除冗余特征，添加预测所需的新特征，构建改进后的数据集。逻辑回归、K近邻(KNN)、支持向量机(SVM)、决策树、随机森林、XGBoost、采用机器学习等算法重组数据集的人工神经网络AdaBoost、8预测构建模型应用(ANN)之一。标准来分析准确率。在实验中，这些分类已被证明对乳腺癌有效，准确率为97%。逻辑回归，XGBoost和Adaboost，以99.28%的准确率位居榜首。实验还表明，平衡数据集的去除异常值和平衡，对模型的预测性能有显著影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Breast Cancer Detection with Revamped Dataset Using Machine Learning Techniques

Machine learning is a current topic of interest in research and industry, with the implementation of novel strategies all the time. The main purpose of this research activity is to determine the efficiency of machine learning techniques in the detection research of breast cancer. The incidence and mortality of breast cancer in women are increasing day by day. Worldwide, researchers have worked hard to help clinicians provide the best model for detecting diagnosis and breast cancer. In this work, learning UCI machine Wisconsin breast cancer data from a set of databases, model, and analyze the performance of existing work use, compared to the same data set. The dataset is analyzed, and the revamped dataset is constructed by eliminating redundant features and appending new features essential for prediction. Logistic regression, K nearest neighbors (KNN), support vector machine (SVM), decision trees, random forest, XGBoost, using a machine learning algorithm, such as re-organized data set of artificial neural network AdaBoost, 8 one of prediction build the model application (ANN). Standard to analyze the accuracy rate. In the experiment, these classifications have been shown to work for breast cancer with >97% accuracy. Logistic regression, XGBoost and Adaboost, stand on top with 99.28 percent accuracy. The experiment also, the balanced data set of removal outliers and balance, shows that have a significant impact on the model’s prediction performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

J. Medical Imaging Health Informatics

自引率

0.00%

发文量