{"title":"Biomarker Discovery via Optimal Bayesian Feature Filtering for Structured Multiclass Data","authors":"Ali Foroughi pour, Lori A. Dalton","doi":"10.1145/3233547.3233558","DOIUrl":null,"url":null,"abstract":"Biomarker discovery aims to find a shortlist of high-profile biomarkers that can be further verified and utilized in downstream analysis. Many biomarkers exhibit structured multiclass behavior, where groups of interest may be clustered into a small number of patterns such that groups assigned the same pattern share a common governing distribution. While several algorithms are proposed for multiclass problems, to the best of our knowledge, none can take such constraints on the group-pattern assignment, or structure, as input, and output high-profile potential biomarkers as well as the structure they satisfy. While post analyses may be used to infer the structure, ignoring such information impedes feature selection to fully take advantage of experimental data. Recent work proposes a Bayesian framework for feature selection that places priors on feature-label distribution and label-conditioned feature distribution. Here we extend this framework for structured multiclass problems, solve the proposed model for the case of independent features, evaluate it in several synthetic simulations, apply it to two cancer datasets, and perform enrichment analysis. Many of the highly ranked genes and pathways are suggested to be affected in the cancer under study. We also find potentially new biomarkers. Not only do we detect biomarkers, but also make inferences about the underlying distributional connections across classes, which provide additional insight on cancer biology.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Biomarker discovery aims to find a shortlist of high-profile biomarkers that can be further verified and utilized in downstream analysis. Many biomarkers exhibit structured multiclass behavior, where groups of interest may be clustered into a small number of patterns such that groups assigned the same pattern share a common governing distribution. While several algorithms are proposed for multiclass problems, to the best of our knowledge, none can take such constraints on the group-pattern assignment, or structure, as input, and output high-profile potential biomarkers as well as the structure they satisfy. While post analyses may be used to infer the structure, ignoring such information impedes feature selection to fully take advantage of experimental data. Recent work proposes a Bayesian framework for feature selection that places priors on feature-label distribution and label-conditioned feature distribution. Here we extend this framework for structured multiclass problems, solve the proposed model for the case of independent features, evaluate it in several synthetic simulations, apply it to two cancer datasets, and perform enrichment analysis. Many of the highly ranked genes and pathways are suggested to be affected in the cancer under study. We also find potentially new biomarkers. Not only do we detect biomarkers, but also make inferences about the underlying distributional connections across classes, which provide additional insight on cancer biology.