通过稀疏变异 EM 算法实现知识引导的双聚类。

Changgee Chang, Jihwan Oh, Eun Jeong Min, Qi Long
{"title":"通过稀疏变异 EM 算法实现知识引导的双聚类。","authors":"Changgee Chang, Jihwan Oh, Eun Jeong Min, Qi Long","doi":"10.1109/icbk.2019.00012","DOIUrl":null,"url":null,"abstract":"<p><p>A biclustering in the analysis of a gene expression data matrix, for example, is defined as a set of biclusters where each bicluster is a group of genes and a group of samples for which the genes are differentially expressed. Although many data mining approaches for biclustering exist in the literature, only few are able to incorporate prior knowledge to the analysis, which can lead to great improvements in terms of accuracy and interpretability, and all are limited in handling discrete data types. We propose a generalized biclustering approach that can be used for integrative analysis of multi-omics data with different data types. Our method is capable of utilizing biological information that can be represented by graph such as functional genomics and functional proteomics and accommodating a combination of continuous and discrete data types. The proposed method builds on a generalized Bayesian factor analysis framework and a variational EM approach is used to obtain parameter estimates, where the latent quantities in the loglikelihood are iteratively imputed by their conditional expectations. The biclusters are retrieved via the sparse estimates of the factor loadings and the conditional expectation of the latent factors. In order to obtain the sparse conditional expectation of the latent factors, a novel sparse variational EM algorithm is used. We demonstrate the superiority of our method over several existing biclustering methods in extensive simulation experiements and in integrative analysis of multi-omics data.</p>","PeriodicalId":93240,"journal":{"name":"10th IEEE International Conference on Big Knowledge : proceedings : 10-11 November 2019, Beijing, China. IEEE International Conference on Big Knowledge (10th : 2019 : Beijing, China)","volume":"2019 ","pages":"25-32"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8291726/pdf/nihms-1588833.pdf","citationCount":"0","resultStr":"{\"title\":\"Knowledge-Guided Biclustering via Sparse Variational EM Algorithm.\",\"authors\":\"Changgee Chang, Jihwan Oh, Eun Jeong Min, Qi Long\",\"doi\":\"10.1109/icbk.2019.00012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>A biclustering in the analysis of a gene expression data matrix, for example, is defined as a set of biclusters where each bicluster is a group of genes and a group of samples for which the genes are differentially expressed. Although many data mining approaches for biclustering exist in the literature, only few are able to incorporate prior knowledge to the analysis, which can lead to great improvements in terms of accuracy and interpretability, and all are limited in handling discrete data types. We propose a generalized biclustering approach that can be used for integrative analysis of multi-omics data with different data types. Our method is capable of utilizing biological information that can be represented by graph such as functional genomics and functional proteomics and accommodating a combination of continuous and discrete data types. The proposed method builds on a generalized Bayesian factor analysis framework and a variational EM approach is used to obtain parameter estimates, where the latent quantities in the loglikelihood are iteratively imputed by their conditional expectations. The biclusters are retrieved via the sparse estimates of the factor loadings and the conditional expectation of the latent factors. In order to obtain the sparse conditional expectation of the latent factors, a novel sparse variational EM algorithm is used. We demonstrate the superiority of our method over several existing biclustering methods in extensive simulation experiements and in integrative analysis of multi-omics data.</p>\",\"PeriodicalId\":93240,\"journal\":{\"name\":\"10th IEEE International Conference on Big Knowledge : proceedings : 10-11 November 2019, Beijing, China. IEEE International Conference on Big Knowledge (10th : 2019 : Beijing, China)\",\"volume\":\"2019 \",\"pages\":\"25-32\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8291726/pdf/nihms-1588833.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"10th IEEE International Conference on Big Knowledge : proceedings : 10-11 November 2019, Beijing, China. IEEE International Conference on Big Knowledge (10th : 2019 : Beijing, China)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/icbk.2019.00012\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/12/30 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"10th IEEE International Conference on Big Knowledge : proceedings : 10-11 November 2019, Beijing, China. IEEE International Conference on Big Knowledge (10th : 2019 : Beijing, China)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/icbk.2019.00012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/12/30 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

例如,在分析基因表达数据矩阵时,双聚类被定义为一组双聚类,其中每个双聚类都是一组基因和一组样本,这些基因在这些样本中有差异表达。尽管文献中存在许多双簇数据挖掘方法,但只有少数方法能够将先验知识纳入分析,从而在准确性和可解释性方面带来巨大改进,而且所有方法在处理离散数据类型方面都受到限制。我们提出了一种通用的双聚类方法,可用于不同数据类型的多组学数据的综合分析。我们的方法能够利用功能基因组学和功能蛋白质组学等可以用图表表示的生物信息,并兼顾连续和离散数据类型。所提出的方法建立在广义贝叶斯因子分析框架之上,并使用变异 EM 方法来获得参数估计,其中对数似然中的潜在量由其条件期望值迭代估算。通过对潜在因子载荷和条件期望的稀疏估计来检索双簇。为了获得潜在因子的稀疏条件期望,我们使用了一种新颖的稀疏变异 EM 算法。我们在大量模拟实验和多组学数据综合分析中证明了我们的方法优于现有的几种双聚类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Knowledge-Guided Biclustering via Sparse Variational EM Algorithm.

A biclustering in the analysis of a gene expression data matrix, for example, is defined as a set of biclusters where each bicluster is a group of genes and a group of samples for which the genes are differentially expressed. Although many data mining approaches for biclustering exist in the literature, only few are able to incorporate prior knowledge to the analysis, which can lead to great improvements in terms of accuracy and interpretability, and all are limited in handling discrete data types. We propose a generalized biclustering approach that can be used for integrative analysis of multi-omics data with different data types. Our method is capable of utilizing biological information that can be represented by graph such as functional genomics and functional proteomics and accommodating a combination of continuous and discrete data types. The proposed method builds on a generalized Bayesian factor analysis framework and a variational EM approach is used to obtain parameter estimates, where the latent quantities in the loglikelihood are iteratively imputed by their conditional expectations. The biclusters are retrieved via the sparse estimates of the factor loadings and the conditional expectation of the latent factors. In order to obtain the sparse conditional expectation of the latent factors, a novel sparse variational EM algorithm is used. We demonstrate the superiority of our method over several existing biclustering methods in extensive simulation experiements and in integrative analysis of multi-omics data.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信