{"title":"用机器学习方法破译乳腺癌生存的临床和遗传基础","authors":"Zhengkai Zhuang","doi":"10.4236/abcr.2023.124013","DOIUrl":null,"url":null,"abstract":"Breast cancer is one of the most common cancers among women in the world, with more than two million new cases of breast cancer every year. This disease is associated with numerous clinical and genetic characteristics. In recent years, machine learning technology has been increasingly applied to the medical field, including predicting the risk of malignant tumors such as breast cancer. Based on clinical and targeted sequencing data of 1980 primary breast cancer samples, this article aimed to analyze these data and predict living conditions after breast cancer. After data engineering, feature selection, and comparison of machine learning methods, the light gradient boosting machine model was found the best with hyperparameter tuning (precision = 0.818, recall = 0.816, f1 score = 0.817, roc-auc = 0.867). And the top 5 determinants were clinical features age at diagnosis, Nottingham Prognostic Index, cohort and genetic features rheb, nr3c1. The study shed light on rational allocation of medical resources and provided insights to early prevention, diagnosis and treatment of breast cancer with the identified risk clinical and genetic factors.","PeriodicalId":67095,"journal":{"name":"乳腺癌(英文)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decipher Clinical and Genetic Underpins of Breast Cancer Survival with Machine Learning Methods\",\"authors\":\"Zhengkai Zhuang\",\"doi\":\"10.4236/abcr.2023.124013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Breast cancer is one of the most common cancers among women in the world, with more than two million new cases of breast cancer every year. This disease is associated with numerous clinical and genetic characteristics. In recent years, machine learning technology has been increasingly applied to the medical field, including predicting the risk of malignant tumors such as breast cancer. Based on clinical and targeted sequencing data of 1980 primary breast cancer samples, this article aimed to analyze these data and predict living conditions after breast cancer. After data engineering, feature selection, and comparison of machine learning methods, the light gradient boosting machine model was found the best with hyperparameter tuning (precision = 0.818, recall = 0.816, f1 score = 0.817, roc-auc = 0.867). And the top 5 determinants were clinical features age at diagnosis, Nottingham Prognostic Index, cohort and genetic features rheb, nr3c1. The study shed light on rational allocation of medical resources and provided insights to early prevention, diagnosis and treatment of breast cancer with the identified risk clinical and genetic factors.\",\"PeriodicalId\":67095,\"journal\":{\"name\":\"乳腺癌(英文)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"乳腺癌(英文)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4236/abcr.2023.124013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"乳腺癌(英文)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4236/abcr.2023.124013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Decipher Clinical and Genetic Underpins of Breast Cancer Survival with Machine Learning Methods
Breast cancer is one of the most common cancers among women in the world, with more than two million new cases of breast cancer every year. This disease is associated with numerous clinical and genetic characteristics. In recent years, machine learning technology has been increasingly applied to the medical field, including predicting the risk of malignant tumors such as breast cancer. Based on clinical and targeted sequencing data of 1980 primary breast cancer samples, this article aimed to analyze these data and predict living conditions after breast cancer. After data engineering, feature selection, and comparison of machine learning methods, the light gradient boosting machine model was found the best with hyperparameter tuning (precision = 0.818, recall = 0.816, f1 score = 0.817, roc-auc = 0.867). And the top 5 determinants were clinical features age at diagnosis, Nottingham Prognostic Index, cohort and genetic features rheb, nr3c1. The study shed light on rational allocation of medical resources and provided insights to early prevention, diagnosis and treatment of breast cancer with the identified risk clinical and genetic factors.