Yang Zhao, Xiaojie Wang, Lei Ma, Dangguo Shao, Y. Xiang, Xin Xiong, L. Zhang
{"title":"Constructing non-small cell lung cancer survival prediction model based on Borderline-SMOTE and PFS","authors":"Yang Zhao, Xiaojie Wang, Lei Ma, Dangguo Shao, Y. Xiang, Xin Xiong, L. Zhang","doi":"10.3760/CMA.J.ISSN.1673-4181.2019.04.011","DOIUrl":null,"url":null,"abstract":"Objective \nTo predict the 5-year survival of patients with non-small cell lung cancer (NSCLC) by machine learning, and to improve the prediction efficiency and prediction accuracy. \n \n \nMethods \nThe experiments were performed using NSCLC data from the SEER database. According to the imbalance of patient data, the Borderline-SMOTE method was used for data sampling. The perturbation-based feature selection (PFS) method and decision tree (DT) algorithm were used to screen the features and construct the postoperative survival prediction model. \n \n \nResults \nThe patient data was balanced, and seven prognostic variables were screened, including primary site, stage group, surgical primary site, international classification of diseases, race and grade. Compared with LASSO, Tree-based, PFS-SVM and PFS-kNN models, the model constructed using PFS-DT has the best predictive effect. \n \n \nConclusions \nThe patient survival prediction model based on PFS-DT can effectively improve the accuracy of postoperative survival prediction in patients with NSCLC, and can provide a reference for doctors to provide treatment and improve prognosis. \n \n \nKey words: \nNon-small cell lung cancer; Imbalance; Feature selection; Survival prediction","PeriodicalId":61751,"journal":{"name":"国际生物医学工程杂志","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"国际生物医学工程杂志","FirstCategoryId":"1087","ListUrlMain":"https://doi.org/10.3760/CMA.J.ISSN.1673-4181.2019.04.011","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Objective
To predict the 5-year survival of patients with non-small cell lung cancer (NSCLC) by machine learning, and to improve the prediction efficiency and prediction accuracy.
Methods
The experiments were performed using NSCLC data from the SEER database. According to the imbalance of patient data, the Borderline-SMOTE method was used for data sampling. The perturbation-based feature selection (PFS) method and decision tree (DT) algorithm were used to screen the features and construct the postoperative survival prediction model.
Results
The patient data was balanced, and seven prognostic variables were screened, including primary site, stage group, surgical primary site, international classification of diseases, race and grade. Compared with LASSO, Tree-based, PFS-SVM and PFS-kNN models, the model constructed using PFS-DT has the best predictive effect.
Conclusions
The patient survival prediction model based on PFS-DT can effectively improve the accuracy of postoperative survival prediction in patients with NSCLC, and can provide a reference for doctors to provide treatment and improve prognosis.
Key words:
Non-small cell lung cancer; Imbalance; Feature selection; Survival prediction