基于文献的基因表达数据贝叶斯分析

Lijing Xu, R. Homayouni, E. George
{"title":"基于文献的基因表达数据贝叶斯分析","authors":"Lijing Xu, R. Homayouni, E. George","doi":"10.1109/BIBMW.2011.6112549","DOIUrl":null,"url":null,"abstract":"Recent research has focused on incorporating biological function and pathway information into the analysis of gene expression data, partly as a means of compensating for insufficient experimental replications, low signal to noise, lack of reproducibility and/or multiple testing confounds. A Bayesian approach seems to be ideal for incorporating functional information into gene expression data analysis. In this study, we tested the feasibility of using literature derived gene relationships in a Bayesian model to analyze gene expression data. Prior distributions were constructed based on gene associations derived from the biomedical literature using Latent Semantic Indexing (LSI). The LSI model was built using more than 1 million Medline abstracts corresponding to 22,000 human and mouse genes. A key advantage of LSI is that both explicit and implicit gene relationships can be derived from the literature. Gene neighborhoods were determined using latent Gaussian Markov random fields and logistic transformation of the latent variables. We tested the procedure on a microarray dataset for interferon-stimulated genes in mouse embryonic fibroblasts. By integrating functional information from literature, Bayesian approach identified relevant genes that previously did not meet the 0.05 significance level. In comparison to a standard mixture model, spatial mixture model has more power for identifying direct and indirect interferon regulated genes. The spatial model enhanced the ranks of some genes which are known to be affected by interferon treatment, such as Nmi (NMI N-myc and STAT interactor) and ifi35 (interferon-induced protein 35). It also identified some genes that previously were ignored because of the marginal p-values, such as dpysl2, map2k1, msn, Psck5, and Il6st. Interestingly, these genes appear to be indirectly related to interferon treatment. In summary, we show that our procedure increases statistical power and produces more biologically meaningful gene lists. These results suggest that Bayesian methods which incorporate functional information from the literature may improve analysis of gene expression data.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"14 1","pages":"1032-1032"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Literature based Bayesian analysis of gene expression data\",\"authors\":\"Lijing Xu, R. Homayouni, E. George\",\"doi\":\"10.1109/BIBMW.2011.6112549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent research has focused on incorporating biological function and pathway information into the analysis of gene expression data, partly as a means of compensating for insufficient experimental replications, low signal to noise, lack of reproducibility and/or multiple testing confounds. A Bayesian approach seems to be ideal for incorporating functional information into gene expression data analysis. In this study, we tested the feasibility of using literature derived gene relationships in a Bayesian model to analyze gene expression data. Prior distributions were constructed based on gene associations derived from the biomedical literature using Latent Semantic Indexing (LSI). The LSI model was built using more than 1 million Medline abstracts corresponding to 22,000 human and mouse genes. A key advantage of LSI is that both explicit and implicit gene relationships can be derived from the literature. Gene neighborhoods were determined using latent Gaussian Markov random fields and logistic transformation of the latent variables. We tested the procedure on a microarray dataset for interferon-stimulated genes in mouse embryonic fibroblasts. By integrating functional information from literature, Bayesian approach identified relevant genes that previously did not meet the 0.05 significance level. In comparison to a standard mixture model, spatial mixture model has more power for identifying direct and indirect interferon regulated genes. The spatial model enhanced the ranks of some genes which are known to be affected by interferon treatment, such as Nmi (NMI N-myc and STAT interactor) and ifi35 (interferon-induced protein 35). It also identified some genes that previously were ignored because of the marginal p-values, such as dpysl2, map2k1, msn, Psck5, and Il6st. Interestingly, these genes appear to be indirectly related to interferon treatment. In summary, we show that our procedure increases statistical power and produces more biologically meaningful gene lists. These results suggest that Bayesian methods which incorporate functional information from the literature may improve analysis of gene expression data.\",\"PeriodicalId\":6345,\"journal\":{\"name\":\"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)\",\"volume\":\"14 1\",\"pages\":\"1032-1032\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBMW.2011.6112549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2011.6112549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近的研究集中在将生物学功能和途径信息纳入基因表达数据的分析中,部分作为补偿实验重复不足、低信号噪声、缺乏可重复性和/或多重测试混淆的手段。贝叶斯方法似乎是将功能信息纳入基因表达数据分析的理想方法。在本研究中,我们测试了在贝叶斯模型中使用文献导出的基因关系来分析基因表达数据的可行性。使用潜在语义索引(LSI)构建基于生物医学文献中基因关联的先验分布。LSI模型是使用超过100万份Medline摘要建立的,这些摘要对应22,000个人类和小鼠基因。LSI的一个关键优势是显性和隐性基因关系都可以从文献中得到。利用隐高斯马尔科夫随机场和隐变量的逻辑变换确定基因邻域。我们在小鼠胚胎成纤维细胞中干扰素刺激基因的微阵列数据集上测试了该程序。通过整合文献中的功能信息,贝叶斯方法识别出之前未达到0.05显著性水平的相关基因。与标准混合模型相比,空间混合模型在识别干扰素直接和间接调控基因方面具有更强的能力。空间模型提高了一些已知受干扰素治疗影响的基因的等级,如Nmi (Nmi N-myc和STAT相互作用因子)和ifi35(干扰素诱导蛋白35)。它还发现了一些以前由于边际p值而被忽略的基因,如dpysl2、map2k1、msn、Psck5和Il6st。有趣的是,这些基因似乎与干扰素治疗间接相关。总之,我们表明我们的程序提高了统计能力,并产生了更有生物学意义的基因列表。这些结果表明,贝叶斯方法结合了文献中的功能信息,可以改善基因表达数据的分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Literature based Bayesian analysis of gene expression data
Recent research has focused on incorporating biological function and pathway information into the analysis of gene expression data, partly as a means of compensating for insufficient experimental replications, low signal to noise, lack of reproducibility and/or multiple testing confounds. A Bayesian approach seems to be ideal for incorporating functional information into gene expression data analysis. In this study, we tested the feasibility of using literature derived gene relationships in a Bayesian model to analyze gene expression data. Prior distributions were constructed based on gene associations derived from the biomedical literature using Latent Semantic Indexing (LSI). The LSI model was built using more than 1 million Medline abstracts corresponding to 22,000 human and mouse genes. A key advantage of LSI is that both explicit and implicit gene relationships can be derived from the literature. Gene neighborhoods were determined using latent Gaussian Markov random fields and logistic transformation of the latent variables. We tested the procedure on a microarray dataset for interferon-stimulated genes in mouse embryonic fibroblasts. By integrating functional information from literature, Bayesian approach identified relevant genes that previously did not meet the 0.05 significance level. In comparison to a standard mixture model, spatial mixture model has more power for identifying direct and indirect interferon regulated genes. The spatial model enhanced the ranks of some genes which are known to be affected by interferon treatment, such as Nmi (NMI N-myc and STAT interactor) and ifi35 (interferon-induced protein 35). It also identified some genes that previously were ignored because of the marginal p-values, such as dpysl2, map2k1, msn, Psck5, and Il6st. Interestingly, these genes appear to be indirectly related to interferon treatment. In summary, we show that our procedure increases statistical power and produces more biologically meaningful gene lists. These results suggest that Bayesian methods which incorporate functional information from the literature may improve analysis of gene expression data.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信