Saeed Sarbazi-Azad, M. S. Abadeh, Mehdi Irannejad Najaf Abadi
{"title":"Feature Selection in Microarray Gene Expression Data Using Fisher Discriminant Ratio","authors":"Saeed Sarbazi-Azad, M. S. Abadeh, Mehdi Irannejad Najaf Abadi","doi":"10.1109/ICCKE.2018.8566649","DOIUrl":null,"url":null,"abstract":"One of major issues in microarray gene expression datasets is high dimensionality. Redundant features and low number of samples hinder the process of learning a model and the created model results in low performance. To create a model with high performance and low error rate, it is staple to reduce the number of features. In the last two decades, the data complexity measures were employed for different usages in machine learning such as feature selection. In proposed method of this paper, first the features of dataset are ranked by one of data complexity measures named fisher discriminant ratio and afterwards the highest ranked features are selected from the feature set. Experiments are performed on 5 well-known binary microarray datasets to assess the performance of the proposed method. For classification, support vector machine, decision tree, naive bayes and k-nearest neighbor algorithms were applied to the resulting discussed features. The results demonstrate transcendent performance in terms of low computational time and higher accuracy on tested data.","PeriodicalId":283700,"journal":{"name":"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"341 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 8th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2018.8566649","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
One of major issues in microarray gene expression datasets is high dimensionality. Redundant features and low number of samples hinder the process of learning a model and the created model results in low performance. To create a model with high performance and low error rate, it is staple to reduce the number of features. In the last two decades, the data complexity measures were employed for different usages in machine learning such as feature selection. In proposed method of this paper, first the features of dataset are ranked by one of data complexity measures named fisher discriminant ratio and afterwards the highest ranked features are selected from the feature set. Experiments are performed on 5 well-known binary microarray datasets to assess the performance of the proposed method. For classification, support vector machine, decision tree, naive bayes and k-nearest neighbor algorithms were applied to the resulting discussed features. The results demonstrate transcendent performance in terms of low computational time and higher accuracy on tested data.