Biomarker Discovery via Optimal Bayesian Feature Filtering for Structured Multiclass Data

Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics Pub Date : 2018-08-15 DOI:10.1145/3233547.3233558

Ali Foroughi pour, Lori A. Dalton

{"title":"Biomarker Discovery via Optimal Bayesian Feature Filtering for Structured Multiclass Data","authors":"Ali Foroughi pour, Lori A. Dalton","doi":"10.1145/3233547.3233558","DOIUrl":null,"url":null,"abstract":"Biomarker discovery aims to find a shortlist of high-profile biomarkers that can be further verified and utilized in downstream analysis. Many biomarkers exhibit structured multiclass behavior, where groups of interest may be clustered into a small number of patterns such that groups assigned the same pattern share a common governing distribution. While several algorithms are proposed for multiclass problems, to the best of our knowledge, none can take such constraints on the group-pattern assignment, or structure, as input, and output high-profile potential biomarkers as well as the structure they satisfy. While post analyses may be used to infer the structure, ignoring such information impedes feature selection to fully take advantage of experimental data. Recent work proposes a Bayesian framework for feature selection that places priors on feature-label distribution and label-conditioned feature distribution. Here we extend this framework for structured multiclass problems, solve the proposed model for the case of independent features, evaluate it in several synthetic simulations, apply it to two cancer datasets, and perform enrichment analysis. Many of the highly ranked genes and pathways are suggested to be affected in the cancer under study. We also find potentially new biomarkers. Not only do we detect biomarkers, but also make inferences about the underlying distributional connections across classes, which provide additional insight on cancer biology.","PeriodicalId":131906,"journal":{"name":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3233547.3233558","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Biomarker discovery aims to find a shortlist of high-profile biomarkers that can be further verified and utilized in downstream analysis. Many biomarkers exhibit structured multiclass behavior, where groups of interest may be clustered into a small number of patterns such that groups assigned the same pattern share a common governing distribution. While several algorithms are proposed for multiclass problems, to the best of our knowledge, none can take such constraints on the group-pattern assignment, or structure, as input, and output high-profile potential biomarkers as well as the structure they satisfy. While post analyses may be used to infer the structure, ignoring such information impedes feature selection to fully take advantage of experimental data. Recent work proposes a Bayesian framework for feature selection that places priors on feature-label distribution and label-conditioned feature distribution. Here we extend this framework for structured multiclass problems, solve the proposed model for the case of independent features, evaluate it in several synthetic simulations, apply it to two cancer datasets, and perform enrichment analysis. Many of the highly ranked genes and pathways are suggested to be affected in the cancer under study. We also find potentially new biomarkers. Not only do we detect biomarkers, but also make inferences about the underlying distributional connections across classes, which provide additional insight on cancer biology.

查看原文本刊更多论文

基于最优贝叶斯特征过滤的结构化多类数据生物标志物发现

生物标志物发现的目的是找到一个高知名度的生物标志物候选名单，这些生物标志物可以进一步验证并用于下游分析。许多生物标记物表现出结构化的多类行为，其中感兴趣的组可能聚集成少数模式，这样分配相同模式的组共享一个共同的控制分布。虽然针对多类问题提出了几种算法，但据我们所知，没有一种算法可以将这种对群体模式分配或结构的约束作为输入，并输出高调的潜在生物标记物及其满足的结构。虽然后期分析可以用来推断结构，但忽略这些信息会阻碍特征选择充分利用实验数据。最近的工作提出了一个贝叶斯框架用于特征选择，该框架优先考虑特征标签分布和标签条件特征分布。在这里，我们将该框架扩展到结构化多类问题，解决了独立特征情况下提出的模型，在几个综合模拟中对其进行了评估，将其应用于两个癌症数据集，并进行了富集分析。许多排名靠前的基因和途径被认为在研究中的癌症中受到影响。我们还发现了潜在的新生物标志物。我们不仅检测到生物标志物，而且还推断出不同类别之间潜在的分布联系，这为癌症生物学提供了额外的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics

自引率

0.00%

发文量