加权池高通量基因表达数据集,以最大限度地提高顶级基因的功能一致性

Xiaodong Zhou, E. George
{"title":"加权池高通量基因表达数据集,以最大限度地提高顶级基因的功能一致性","authors":"Xiaodong Zhou, E. George","doi":"10.1109/BIBMW.2011.6112550","DOIUrl":null,"url":null,"abstract":"In a typical gene expression study with high throughput technique, such as microarray, a biologist usually focuses on the top genes ranked by the P-values to establish gene functional relationship / network, biological pathway, and microbiologically ramifications of the gene's selection. With more datasets publically available, researchers pool data from independent experiments, typically by pooling P-values with equal weight assigned to each dataset, aiming to fetch more biological information from the pooled data. However, the qualities of datasets may vary substantially. Assigning equal weights may not guarantee the optimal result. Applying the equal weights approach to six independent datasets, we observe the top rank genes of data pooled with this approach have less functional coherence than the single dataset that has highest functional coherence. We propose a procedure based on enhanced simulated annealing (ESA) and literature semantic indexing cohesive (LSI-c) analysis to assign optimal weights to datasets so as to maximize the functional coherence of the top rank genes ordered by their pooled P-values. We observe significantly more functional coherence in optimally pooled data than any single dataset or data pooled with equal weights. Identification of top rank genes through our optimal procedure should improve the downstream analysis.","PeriodicalId":6345,"journal":{"name":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","volume":"91 1","pages":"1033-1033"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Weighted pooling high-throughput gene expression data sets to maximize the functional coherence of the top rank genes\",\"authors\":\"Xiaodong Zhou, E. George\",\"doi\":\"10.1109/BIBMW.2011.6112550\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a typical gene expression study with high throughput technique, such as microarray, a biologist usually focuses on the top genes ranked by the P-values to establish gene functional relationship / network, biological pathway, and microbiologically ramifications of the gene's selection. With more datasets publically available, researchers pool data from independent experiments, typically by pooling P-values with equal weight assigned to each dataset, aiming to fetch more biological information from the pooled data. However, the qualities of datasets may vary substantially. Assigning equal weights may not guarantee the optimal result. Applying the equal weights approach to six independent datasets, we observe the top rank genes of data pooled with this approach have less functional coherence than the single dataset that has highest functional coherence. We propose a procedure based on enhanced simulated annealing (ESA) and literature semantic indexing cohesive (LSI-c) analysis to assign optimal weights to datasets so as to maximize the functional coherence of the top rank genes ordered by their pooled P-values. We observe significantly more functional coherence in optimally pooled data than any single dataset or data pooled with equal weights. Identification of top rank genes through our optimal procedure should improve the downstream analysis.\",\"PeriodicalId\":6345,\"journal\":{\"name\":\"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)\",\"volume\":\"91 1\",\"pages\":\"1033-1033\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBMW.2011.6112550\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBMW.2011.6112550","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在典型的高通量基因表达研究中,如微阵列技术,生物学家通常关注p值排名靠前的基因,以建立基因功能关系/网络、生物学途径和基因选择的微生物学后果。随着越来越多的数据集公开可用,研究人员将来自独立实验的数据汇集在一起,通常是通过将每个数据集赋予相同权重的p值汇集在一起,旨在从汇集的数据中获取更多的生物信息。然而,数据集的质量可能会有很大差异。分配相等的权重可能不能保证最佳结果。将等权方法应用于6个独立数据集,我们观察到,与具有最高功能相干性的单个数据集相比,该方法汇集的顶级基因具有更低的功能相干性。我们提出了一种基于增强模拟退火(ESA)和文献语义索引内聚(LSI-c)分析的程序,为数据集分配最优权重,从而最大限度地提高按其汇集的p值排序的顶级基因的功能一致性。我们观察到,与任何单一数据集或具有相同权重的数据池相比,优化池数据中的功能一致性明显更高。通过我们的最优程序鉴定顶级基因将改善下游分析。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Weighted pooling high-throughput gene expression data sets to maximize the functional coherence of the top rank genes
In a typical gene expression study with high throughput technique, such as microarray, a biologist usually focuses on the top genes ranked by the P-values to establish gene functional relationship / network, biological pathway, and microbiologically ramifications of the gene's selection. With more datasets publically available, researchers pool data from independent experiments, typically by pooling P-values with equal weight assigned to each dataset, aiming to fetch more biological information from the pooled data. However, the qualities of datasets may vary substantially. Assigning equal weights may not guarantee the optimal result. Applying the equal weights approach to six independent datasets, we observe the top rank genes of data pooled with this approach have less functional coherence than the single dataset that has highest functional coherence. We propose a procedure based on enhanced simulated annealing (ESA) and literature semantic indexing cohesive (LSI-c) analysis to assign optimal weights to datasets so as to maximize the functional coherence of the top rank genes ordered by their pooled P-values. We observe significantly more functional coherence in optimally pooled data than any single dataset or data pooled with equal weights. Identification of top rank genes through our optimal procedure should improve the downstream analysis.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信