{"title":"基于混合特征选择的10年兽癌生存预测模型研究","authors":"Yufang Deng","doi":"10.1145/3523286.3524578","DOIUrl":null,"url":null,"abstract":"On the basis of the breast cancer data from 1973 to 2015 in the SEER database, the optimal feature selection is based on the hybrid feature selection algorithm. Hybrid feature selection algorithm is a combination of filtering method and heuristic search algorithm. First, chi-square test (chi) is used to filter redundant or irrelevant features, and then an improved genetic algorithm is used to search to find the best combination of features. Mainly improved the formulation of fitness and improved roulette selection. Then the XGBoost classification algorithm is used to establish a 10-year survival prediction model for breast cancer patients. The experimental result show that the data is reduced from 22-dimensional features to 6-dimensional by using hybrid feature selection method, and in terms of five indicators, the model established by this method is better than the model established by all features. The accuracy, precision and AUC of this model are 0.8468, 0.8385, and 0.8181 respectively, which is superior to of all other models.","PeriodicalId":268165,"journal":{"name":"2022 2nd International Conference on Bioinformatics and Intelligent Computing","volume":"98 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Research on 10-year Beast Cancer Survival Prediction Model Based on Mixed Feature Selection\",\"authors\":\"Yufang Deng\",\"doi\":\"10.1145/3523286.3524578\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"On the basis of the breast cancer data from 1973 to 2015 in the SEER database, the optimal feature selection is based on the hybrid feature selection algorithm. Hybrid feature selection algorithm is a combination of filtering method and heuristic search algorithm. First, chi-square test (chi) is used to filter redundant or irrelevant features, and then an improved genetic algorithm is used to search to find the best combination of features. Mainly improved the formulation of fitness and improved roulette selection. Then the XGBoost classification algorithm is used to establish a 10-year survival prediction model for breast cancer patients. The experimental result show that the data is reduced from 22-dimensional features to 6-dimensional by using hybrid feature selection method, and in terms of five indicators, the model established by this method is better than the model established by all features. The accuracy, precision and AUC of this model are 0.8468, 0.8385, and 0.8181 respectively, which is superior to of all other models.\",\"PeriodicalId\":268165,\"journal\":{\"name\":\"2022 2nd International Conference on Bioinformatics and Intelligent Computing\",\"volume\":\"98 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 2nd International Conference on Bioinformatics and Intelligent Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3523286.3524578\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 2nd International Conference on Bioinformatics and Intelligent Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3523286.3524578","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Research on 10-year Beast Cancer Survival Prediction Model Based on Mixed Feature Selection
On the basis of the breast cancer data from 1973 to 2015 in the SEER database, the optimal feature selection is based on the hybrid feature selection algorithm. Hybrid feature selection algorithm is a combination of filtering method and heuristic search algorithm. First, chi-square test (chi) is used to filter redundant or irrelevant features, and then an improved genetic algorithm is used to search to find the best combination of features. Mainly improved the formulation of fitness and improved roulette selection. Then the XGBoost classification algorithm is used to establish a 10-year survival prediction model for breast cancer patients. The experimental result show that the data is reduced from 22-dimensional features to 6-dimensional by using hybrid feature selection method, and in terms of five indicators, the model established by this method is better than the model established by all features. The accuracy, precision and AUC of this model are 0.8468, 0.8385, and 0.8181 respectively, which is superior to of all other models.