{"title":"An embedded method for gene identification in heterogenous data involving unwanted heterogeneity","authors":"Meng Lu","doi":"10.1109/BIBM.2018.8621445","DOIUrl":null,"url":null,"abstract":"The various ways of data collection for modern applications such as bioinformatics result in heterogeneous data, which presents challenges for traditional variable selection methods that assume data is independent and identically distributed. Existing statistical models accounting for unwanted variation can be applied for gene identification in heterogeneous genetic data, which however suffer from variable redundancy and also lack of predictability. To cope with that, we propose an embedded variable selection method for gene identification from a sparse learning perspective which is capable of accounting for unwanted heterogeneity blurring the true gene effects. Its performance is investigated by studying two different unsupervised and supervised gene identification problems in which the benchmark data samples are heterogeneous and collected with group structures. The results have demonstrated the superiority of our method over state-of-the art methods by effectively accounting for the unwanted heterogeneity in both cases.","PeriodicalId":108667,"journal":{"name":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2018.8621445","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The various ways of data collection for modern applications such as bioinformatics result in heterogeneous data, which presents challenges for traditional variable selection methods that assume data is independent and identically distributed. Existing statistical models accounting for unwanted variation can be applied for gene identification in heterogeneous genetic data, which however suffer from variable redundancy and also lack of predictability. To cope with that, we propose an embedded variable selection method for gene identification from a sparse learning perspective which is capable of accounting for unwanted heterogeneity blurring the true gene effects. Its performance is investigated by studying two different unsupervised and supervised gene identification problems in which the benchmark data samples are heterogeneous and collected with group structures. The results have demonstrated the superiority of our method over state-of-the art methods by effectively accounting for the unwanted heterogeneity in both cases.