{"title":"CT Radio Genomics of Non-Small Cell Lung Cancer Using Machine and Deep Learning","authors":"Yi-yun Song","doi":"10.1109/ICCECE51280.2021.9342170","DOIUrl":null,"url":null,"abstract":"Non-small cell lung cancer is the most common type of lung cancer, and the most common genetic markers for it are mutation of the epidermal growth factor receptor gene (EGFR) and the Kirsten rat sarcoma (KRAS) gene. The objective of this paper was to predict the EGFR and KRAS mutation status, given CT features, by using machine learning models. Features extracted from 144 CT scans of the tumor area included statistical, shape, pathological, and deep learning features. The ResNet-34 neural network was used to extract deep learning features. All features were fed into machine learning models (random forest, logistic regression, support vector machine) and evaluated with 10-fold cross validation, confusion matrices, and the area under the ROC curves. P-values were calculated through t-testing and Mann-Whitley rank-sum testing, proving a significant statistical difference between mutated and non mutated genes. Between predicting EGFR and KRAS mutations, all machine learning models performed better in predicting EGFR mutations. In predicting EGFR mutation, the logistic regression (AUC =0.85) and support vector machine (AUC =0.84) machine learning models performed best. In predicting KRAS mutations, the machine learning models performed sub-optimally, with the best performance from the support vector machine (AUC =0.73). By calculating permutation feature importance, it can be seen that the inclusion of deep learning features aided in the machine learning models’ performance.Overall, machine learning algorithms, if optimized and provided with more data, could prove useful in predicting EGFR and KRAS mutation status in NSCLC patients, saving time and money.","PeriodicalId":229425,"journal":{"name":"2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Consumer Electronics and Computer Engineering (ICCECE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCECE51280.2021.9342170","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Non-small cell lung cancer is the most common type of lung cancer, and the most common genetic markers for it are mutation of the epidermal growth factor receptor gene (EGFR) and the Kirsten rat sarcoma (KRAS) gene. The objective of this paper was to predict the EGFR and KRAS mutation status, given CT features, by using machine learning models. Features extracted from 144 CT scans of the tumor area included statistical, shape, pathological, and deep learning features. The ResNet-34 neural network was used to extract deep learning features. All features were fed into machine learning models (random forest, logistic regression, support vector machine) and evaluated with 10-fold cross validation, confusion matrices, and the area under the ROC curves. P-values were calculated through t-testing and Mann-Whitley rank-sum testing, proving a significant statistical difference between mutated and non mutated genes. Between predicting EGFR and KRAS mutations, all machine learning models performed better in predicting EGFR mutations. In predicting EGFR mutation, the logistic regression (AUC =0.85) and support vector machine (AUC =0.84) machine learning models performed best. In predicting KRAS mutations, the machine learning models performed sub-optimally, with the best performance from the support vector machine (AUC =0.73). By calculating permutation feature importance, it can be seen that the inclusion of deep learning features aided in the machine learning models’ performance.Overall, machine learning algorithms, if optimized and provided with more data, could prove useful in predicting EGFR and KRAS mutation status in NSCLC patients, saving time and money.