一种使用数据科学过程和机器学习预测乳腺癌的分析方法

2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC) Pub Date : 2022-11-09 DOI:10.1109/ROPEC55836.2022.10018755

Juan Jose Cordova Calle, John Xavier Farez Villa, Remigio Ismael Hurtado Ortiz

{"title":"一种使用数据科学过程和机器学习预测乳腺癌的分析方法","authors":"Juan Jose Cordova Calle, John Xavier Farez Villa, Remigio Ismael Hurtado Ortiz","doi":"10.1109/ROPEC55836.2022.10018755","DOIUrl":null,"url":null,"abstract":"In two decades, the number of people with breast cancer has almost doubled: in 2000, about 10 million patients had the disease; by 2020, it had reached 19 million. It is estimated that one in five people today will develop some form of cancer in their lifetime. Studies suggest that the number of people diagnosed with cancer will increase in the coming years, being approximately 50% higher in 2040 than in 2020. This article provides an analysis method to predict or diagnose breast cancer using data science processes and machine learning. The analysis method is structured into three phases. The first one is a data preparation phase, the second one is a predictive analysis phase, and the last one is an evaluation metric. Therefore, the predictions are experimented with machine learning techniques, which are: KNN, gradient boosting classifier, and random forest, for which evaluation metrics are presented with the next quality measures: Accuracy, Precision, Recall, and F1-Score. The dataset selected for this phase of analysis is Wisconsin breast cancer [1]. These data analysis techniques can be extended to other learning techniques and can also be used in future scientific work such as disease prediction or medicine in general.","PeriodicalId":237392,"journal":{"name":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An analysis method for predicting breast cancer using data science processes and machine learning\",\"authors\":\"Juan Jose Cordova Calle, John Xavier Farez Villa, Remigio Ismael Hurtado Ortiz\",\"doi\":\"10.1109/ROPEC55836.2022.10018755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In two decades, the number of people with breast cancer has almost doubled: in 2000, about 10 million patients had the disease; by 2020, it had reached 19 million. It is estimated that one in five people today will develop some form of cancer in their lifetime. Studies suggest that the number of people diagnosed with cancer will increase in the coming years, being approximately 50% higher in 2040 than in 2020. This article provides an analysis method to predict or diagnose breast cancer using data science processes and machine learning. The analysis method is structured into three phases. The first one is a data preparation phase, the second one is a predictive analysis phase, and the last one is an evaluation metric. Therefore, the predictions are experimented with machine learning techniques, which are: KNN, gradient boosting classifier, and random forest, for which evaluation metrics are presented with the next quality measures: Accuracy, Precision, Recall, and F1-Score. The dataset selected for this phase of analysis is Wisconsin breast cancer [1]. These data analysis techniques can be extended to other learning techniques and can also be used in future scientific work such as disease prediction or medicine in general.\",\"PeriodicalId\":237392,\"journal\":{\"name\":\"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROPEC55836.2022.10018755\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC55836.2022.10018755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

二十年来，乳腺癌患者的数量几乎翻了一番:2000年，大约有1000万患者患有这种疾病;到2020年，这一数字已达到1900万。据估计，今天有五分之一的人会在一生中患上某种形式的癌症。研究表明，未来几年被诊断患有癌症的人数将会增加，到2040年将比2020年高出约50%。本文提供了一种使用数据科学过程和机器学习来预测或诊断乳腺癌的分析方法。分析方法分为三个阶段。第一个是数据准备阶段，第二个是预测分析阶段，最后一个是评估指标。因此，使用机器学习技术进行预测实验，这些技术是:KNN，梯度增强分类器和随机森林，其中评估指标与下一个质量度量一起呈现:准确性，精度，召回率和F1-Score。本阶段分析选择的数据集是威斯康星州乳腺癌[1]。这些数据分析技术可以扩展到其他学习技术，也可以用于未来的科学工作，如疾病预测或一般医学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An analysis method for predicting breast cancer using data science processes and machine learning

In two decades, the number of people with breast cancer has almost doubled: in 2000, about 10 million patients had the disease; by 2020, it had reached 19 million. It is estimated that one in five people today will develop some form of cancer in their lifetime. Studies suggest that the number of people diagnosed with cancer will increase in the coming years, being approximately 50% higher in 2040 than in 2020. This article provides an analysis method to predict or diagnose breast cancer using data science processes and machine learning. The analysis method is structured into three phases. The first one is a data preparation phase, the second one is a predictive analysis phase, and the last one is an evaluation metric. Therefore, the predictions are experimented with machine learning techniques, which are: KNN, gradient boosting classifier, and random forest, for which evaluation metrics are presented with the next quality measures: Accuracy, Precision, Recall, and F1-Score. The dataset selected for this phase of analysis is Wisconsin breast cancer [1]. These data analysis techniques can be extended to other learning techniques and can also be used in future scientific work such as disease prediction or medicine in general.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)

自引率

0.00%

发文量