Mohamad Aouf, Amr A. Sharawi, Khaled Samir, Sultan Almotatiri, A. Bajahzar, Ghada Kareem
{"title":"Gene Expression Data For Gene Selection Using Ensemble Based Feature Selection","authors":"Mohamad Aouf, Amr A. Sharawi, Khaled Samir, Sultan Almotatiri, A. Bajahzar, Ghada Kareem","doi":"10.1109/ICICIS46948.2019.9014722","DOIUrl":null,"url":null,"abstract":"The technology of next generation sequencing brought about evolution in research that based on sequence which replaces the microarray due to its advantages. The RNA-Seq is a high-throughput gene expression data that uses NGS technologies. The problem of high-throughput RNA-Seq datasets is the high of dimensionally (variables > observations) that need to reduce their dimensions to predict and classify. In this work, ensemble-based on the approach of selection the feature is proposed to choose small optimal genes from various RNA-Seq cancer datasets. The approach combines two filters to select the features (SNR and t-test). In addition, SVM-RFE is used as an embedded feature selection. The SVM classifier used for validating and testing the selected genes using the accuracy measure. The results show the ability to select small optimal genes with high accuracy. Consequently, the genes which are selected will be used as biomarkers to diagnose cancer.","PeriodicalId":200604,"journal":{"name":"2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Ninth International Conference on Intelligent Computing and Information Systems (ICICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICIS46948.2019.9014722","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The technology of next generation sequencing brought about evolution in research that based on sequence which replaces the microarray due to its advantages. The RNA-Seq is a high-throughput gene expression data that uses NGS technologies. The problem of high-throughput RNA-Seq datasets is the high of dimensionally (variables > observations) that need to reduce their dimensions to predict and classify. In this work, ensemble-based on the approach of selection the feature is proposed to choose small optimal genes from various RNA-Seq cancer datasets. The approach combines two filters to select the features (SNR and t-test). In addition, SVM-RFE is used as an embedded feature selection. The SVM classifier used for validating and testing the selected genes using the accuracy measure. The results show the ability to select small optimal genes with high accuracy. Consequently, the genes which are selected will be used as biomarkers to diagnose cancer.