Joint Grid Discretization for Biological Pattern Discovery

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics Pub Date : 2020-09-21 DOI:10.1145/3388440.3412415

Jiandong Wang, Sajal Kumar, Mingzhou Song

{"title":"Joint Grid Discretization for Biological Pattern Discovery","authors":"Jiandong Wang, Sajal Kumar, Mingzhou Song","doi":"10.1145/3388440.3412415","DOIUrl":null,"url":null,"abstract":"The complexity, dynamics, and scale of data acquired by modern biotechnology increasingly favor model-free computational methods that make minimal assumptions about underlying biological mechanisms. For example, single-cell transcriptome and proteome data have a throughput several orders more than bulk methods. Many model-free statistical methods for pattern discovery such as mutual information and chi-squared tests, however, require discrete data. Most discretization methods minimize squared errors for each variable independently, not necessarily retaining joint patterns. To address this issue, we present a joint grid discretization algorithm that preserves clusters in the original data. We evaluated this algorithm on simulated data to show its advantage over other methods in maintaining clusters as measured by the adjusted Rand index. We also show it promotes global functional patterns over independent patterns. On single-cell proteome and transcriptome of leukemia and healthy blood, joint grid discretization captured known protein-to-RNA regulatory relationships, while revealing previously unknown interactions. As such, the joint grid discretization is applicable as a data transformation step in associative, functional, and causal inference of molecular interactions fundamental to systems biology. The developed software is publicly available at https://cran.r-project.org/package=GridOnClusters","PeriodicalId":411338,"journal":{"name":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3388440.3412415","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The complexity, dynamics, and scale of data acquired by modern biotechnology increasingly favor model-free computational methods that make minimal assumptions about underlying biological mechanisms. For example, single-cell transcriptome and proteome data have a throughput several orders more than bulk methods. Many model-free statistical methods for pattern discovery such as mutual information and chi-squared tests, however, require discrete data. Most discretization methods minimize squared errors for each variable independently, not necessarily retaining joint patterns. To address this issue, we present a joint grid discretization algorithm that preserves clusters in the original data. We evaluated this algorithm on simulated data to show its advantage over other methods in maintaining clusters as measured by the adjusted Rand index. We also show it promotes global functional patterns over independent patterns. On single-cell proteome and transcriptome of leukemia and healthy blood, joint grid discretization captured known protein-to-RNA regulatory relationships, while revealing previously unknown interactions. As such, the joint grid discretization is applicable as a data transformation step in associative, functional, and causal inference of molecular interactions fundamental to systems biology. The developed software is publicly available at https://cran.r-project.org/package=GridOnClusters

查看原文本刊更多论文

生物模式发现的联合网格离散化

现代生物技术所获得的数据的复杂性、动态性和规模越来越倾向于对潜在生物机制做出最小假设的无模型计算方法。例如，单细胞转录组和蛋白质组数据的吞吐量比批量方法高几个数量级。然而，许多用于模式发现的无模型统计方法(如互信息和卡方检验)需要离散数据。大多数离散化方法使每个变量的平方误差最小，而不一定保留联合模式。为了解决这个问题，我们提出了一种联合网格离散化算法，该算法保留了原始数据中的聚类。我们在模拟数据上对该算法进行了评估，以显示其在通过调整后的Rand指数衡量的维持集群方面优于其他方法的优势。我们还表明，它促进了全局功能模式而不是独立模式。在白血病和健康血液的单细胞蛋白质组和转录组中，联合网格离散捕获了已知的蛋白质- rna调节关系，同时揭示了以前未知的相互作用。因此，联合网格离散化适用于系统生物学基础的分子相互作用的联想、功能和因果推理的数据转换步骤。开发的软件可在https://cran.r-project.org/package=GridOnClusters上公开获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics

自引率

0.00%

发文量