Naoual Guannoni, F. Mhamdi, Emanuel Weitschek, M. Elloumi
{"title":"Novel algorithm to extract multiple solutions for RNA sequence classification problem","authors":"Naoual Guannoni, F. Mhamdi, Emanuel Weitschek, M. Elloumi","doi":"10.1109/HPCS48598.2019.9188203","DOIUrl":null,"url":null,"abstract":"Knowledge extraction methods from Next Generation Sequencing Data (NGS) are highly requested nowadays. This technology has led to an explosion in the amount of genomic data. However, the efficiency of N GS has posed a challenge for analysis this vast genomic data, gene interaction and expression studies. In this work, we focus on RNA-seq gene expression analysis and specifically of cancer disease studies with rule-based supervised classification algorithms that build a model able to discriminate tumoral to normal cases. State of the art algorithms compute just a single classification model that contains few features. On the contrary, the goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the features related to an investigated class. Major efforts have been made in this field with rule-based algorithms (CAMUR method) and an initial step has been realized with tree-based ones. In this paper, we propose a new method that extracts multiple and equivalent classification methods. This method integrates a rule-based classification method and a feature elimination technique in order to obtain more compact, exact, and interpretable models in a reduced execution time. We analyze an RNA-seq of breast cancer data set extracted from The Cancer Genome Atlas (TCGA) and we compare our results with the existing method (CAMUR). Experimental results show the efficacy of our proposed method. We obtain several reliable and efficient classification models compared to CAMUR method. Also, our method is faster than CAMUR algorithm.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS48598.2019.9188203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Knowledge extraction methods from Next Generation Sequencing Data (NGS) are highly requested nowadays. This technology has led to an explosion in the amount of genomic data. However, the efficiency of N GS has posed a challenge for analysis this vast genomic data, gene interaction and expression studies. In this work, we focus on RNA-seq gene expression analysis and specifically of cancer disease studies with rule-based supervised classification algorithms that build a model able to discriminate tumoral to normal cases. State of the art algorithms compute just a single classification model that contains few features. On the contrary, the goal is to elicit a higher amount of knowledge by computing many classification models, and therefore to identify most of the features related to an investigated class. Major efforts have been made in this field with rule-based algorithms (CAMUR method) and an initial step has been realized with tree-based ones. In this paper, we propose a new method that extracts multiple and equivalent classification methods. This method integrates a rule-based classification method and a feature elimination technique in order to obtain more compact, exact, and interpretable models in a reduced execution time. We analyze an RNA-seq of breast cancer data set extracted from The Cancer Genome Atlas (TCGA) and we compare our results with the existing method (CAMUR). Experimental results show the efficacy of our proposed method. We obtain several reliable and efficient classification models compared to CAMUR method. Also, our method is faster than CAMUR algorithm.