Juan Jose Cordova Calle, John Xavier Farez Villa, Remigio Ismael Hurtado Ortiz
{"title":"一种使用数据科学过程和机器学习预测乳腺癌的分析方法","authors":"Juan Jose Cordova Calle, John Xavier Farez Villa, Remigio Ismael Hurtado Ortiz","doi":"10.1109/ROPEC55836.2022.10018755","DOIUrl":null,"url":null,"abstract":"In two decades, the number of people with breast cancer has almost doubled: in 2000, about 10 million patients had the disease; by 2020, it had reached 19 million. It is estimated that one in five people today will develop some form of cancer in their lifetime. Studies suggest that the number of people diagnosed with cancer will increase in the coming years, being approximately 50% higher in 2040 than in 2020. This article provides an analysis method to predict or diagnose breast cancer using data science processes and machine learning. The analysis method is structured into three phases. The first one is a data preparation phase, the second one is a predictive analysis phase, and the last one is an evaluation metric. Therefore, the predictions are experimented with machine learning techniques, which are: KNN, gradient boosting classifier, and random forest, for which evaluation metrics are presented with the next quality measures: Accuracy, Precision, Recall, and F1-Score. The dataset selected for this phase of analysis is Wisconsin breast cancer [1]. These data analysis techniques can be extended to other learning techniques and can also be used in future scientific work such as disease prediction or medicine in general.","PeriodicalId":237392,"journal":{"name":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An analysis method for predicting breast cancer using data science processes and machine learning\",\"authors\":\"Juan Jose Cordova Calle, John Xavier Farez Villa, Remigio Ismael Hurtado Ortiz\",\"doi\":\"10.1109/ROPEC55836.2022.10018755\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In two decades, the number of people with breast cancer has almost doubled: in 2000, about 10 million patients had the disease; by 2020, it had reached 19 million. It is estimated that one in five people today will develop some form of cancer in their lifetime. Studies suggest that the number of people diagnosed with cancer will increase in the coming years, being approximately 50% higher in 2040 than in 2020. This article provides an analysis method to predict or diagnose breast cancer using data science processes and machine learning. The analysis method is structured into three phases. The first one is a data preparation phase, the second one is a predictive analysis phase, and the last one is an evaluation metric. Therefore, the predictions are experimented with machine learning techniques, which are: KNN, gradient boosting classifier, and random forest, for which evaluation metrics are presented with the next quality measures: Accuracy, Precision, Recall, and F1-Score. The dataset selected for this phase of analysis is Wisconsin breast cancer [1]. These data analysis techniques can be extended to other learning techniques and can also be used in future scientific work such as disease prediction or medicine in general.\",\"PeriodicalId\":237392,\"journal\":{\"name\":\"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ROPEC55836.2022.10018755\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ROPEC55836.2022.10018755","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An analysis method for predicting breast cancer using data science processes and machine learning
In two decades, the number of people with breast cancer has almost doubled: in 2000, about 10 million patients had the disease; by 2020, it had reached 19 million. It is estimated that one in five people today will develop some form of cancer in their lifetime. Studies suggest that the number of people diagnosed with cancer will increase in the coming years, being approximately 50% higher in 2040 than in 2020. This article provides an analysis method to predict or diagnose breast cancer using data science processes and machine learning. The analysis method is structured into three phases. The first one is a data preparation phase, the second one is a predictive analysis phase, and the last one is an evaluation metric. Therefore, the predictions are experimented with machine learning techniques, which are: KNN, gradient boosting classifier, and random forest, for which evaluation metrics are presented with the next quality measures: Accuracy, Precision, Recall, and F1-Score. The dataset selected for this phase of analysis is Wisconsin breast cancer [1]. These data analysis techniques can be extended to other learning techniques and can also be used in future scientific work such as disease prediction or medicine in general.