A new hybrid global optimization approach for selecting clinical and biological features that are relevant to the effective diagnosis of ovarian cancer
Abeer Alzubaidi, David J. Brown, G. Cosma, A. Pockley
{"title":"A new hybrid global optimization approach for selecting clinical and biological features that are relevant to the effective diagnosis of ovarian cancer","authors":"Abeer Alzubaidi, David J. Brown, G. Cosma, A. Pockley","doi":"10.1109/SSCI.2016.7849954","DOIUrl":null,"url":null,"abstract":"Reducing the number of features whilst maintaining an acceptable classification accuracy is a fundamental step in the process of constructing cancer predictive models. In this work, we introduce a novel hybrid (MI-LDA) feature selection approach for the diagnosis of ovarian cancer. This hybrid approach is embedded within a global optimization framework and offers a promising improvement on feature selection and classification accuracy processes. Global Mutual Information (MI) based feature selection optimizes the search process of finding best feature subsets in order to select the highly correlated predictors for ovarian cancer diagnosis. The maximal discriminative cancer predictors are then passed to a Linear Discriminant Analysis (LDA) classifier, and a Genetic Algorithm (GA) is applied to optimise the search process with respect to the estimated error rate of the LDA classifier (MI-LDA). Experiments were performed using an ovarian cancer dataset obtained from the FDA-NCI Clinical Proteomics Program Databank. The performance of the hybrid feature selection approach was evaluated using the Support Vector Machine (SVM) classifier and the LDA classifier. A comparison of the results revealed that the proposed (MI-LDA)-LDA model outperformed the (MI-LDA)-SVM model on selecting the maximal discriminative feature subset and achieved the highest predictive accuracy. The proposed system can therefore be used as an efficient tool for finding predictors and patterns in serum (blood)-derived proteomic data for the detection of ovarian cancer.","PeriodicalId":120288,"journal":{"name":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI.2016.7849954","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Reducing the number of features whilst maintaining an acceptable classification accuracy is a fundamental step in the process of constructing cancer predictive models. In this work, we introduce a novel hybrid (MI-LDA) feature selection approach for the diagnosis of ovarian cancer. This hybrid approach is embedded within a global optimization framework and offers a promising improvement on feature selection and classification accuracy processes. Global Mutual Information (MI) based feature selection optimizes the search process of finding best feature subsets in order to select the highly correlated predictors for ovarian cancer diagnosis. The maximal discriminative cancer predictors are then passed to a Linear Discriminant Analysis (LDA) classifier, and a Genetic Algorithm (GA) is applied to optimise the search process with respect to the estimated error rate of the LDA classifier (MI-LDA). Experiments were performed using an ovarian cancer dataset obtained from the FDA-NCI Clinical Proteomics Program Databank. The performance of the hybrid feature selection approach was evaluated using the Support Vector Machine (SVM) classifier and the LDA classifier. A comparison of the results revealed that the proposed (MI-LDA)-LDA model outperformed the (MI-LDA)-SVM model on selecting the maximal discriminative feature subset and achieved the highest predictive accuracy. The proposed system can therefore be used as an efficient tool for finding predictors and patterns in serum (blood)-derived proteomic data for the detection of ovarian cancer.