一种从异构微阵列中挖掘相干模式的有效算法

Xiang Zhang, Wei Wang
{"title":"一种从异构微阵列中挖掘相干模式的有效算法","authors":"Xiang Zhang, Wei Wang","doi":"10.1109/SSDBM.2007.30","DOIUrl":null,"url":null,"abstract":"DNA microarray techniques present a novel way for geneticists to monitor interactions among tens of thousands of genes simultaneously, and have become standard lab routines in gene discovery, disease diagnosis, and drug design. There has been extensive research on coherent subspace clustering of gene expressions measured under consistent experimental settings. This implies that all experiments are run using the same batch of microarray chips with similar characteristics of noise. Algorithms developed under this assumption may not be applicable for analyzing data collected from heterogeneous settings, where the set of genes being monitored may be different and expression levels may be not directly comparable even for the same gene. In this paper, we propose a model, F-cluster, for mining subspace coherent patterns from heterogeneous gene expression data, which is shown effective for revealing truthful patterns and reducing spurious ones. We also develop an efficient and scalable hybrid approach that combines gene-pair based and sample-pair based pruning to generate F-clusters from multiple gene expression matrices simultaneously. The experimental results demonstrate that our model can discover significant clusters that may not be identified by previous models.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"200 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Efficient Algorithm for Mining Coherent Patterns from Heterogeneous Microarrays\",\"authors\":\"Xiang Zhang, Wei Wang\",\"doi\":\"10.1109/SSDBM.2007.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA microarray techniques present a novel way for geneticists to monitor interactions among tens of thousands of genes simultaneously, and have become standard lab routines in gene discovery, disease diagnosis, and drug design. There has been extensive research on coherent subspace clustering of gene expressions measured under consistent experimental settings. This implies that all experiments are run using the same batch of microarray chips with similar characteristics of noise. Algorithms developed under this assumption may not be applicable for analyzing data collected from heterogeneous settings, where the set of genes being monitored may be different and expression levels may be not directly comparable even for the same gene. In this paper, we propose a model, F-cluster, for mining subspace coherent patterns from heterogeneous gene expression data, which is shown effective for revealing truthful patterns and reducing spurious ones. We also develop an efficient and scalable hybrid approach that combines gene-pair based and sample-pair based pruning to generate F-clusters from multiple gene expression matrices simultaneously. The experimental results demonstrate that our model can discover significant clusters that may not be identified by previous models.\",\"PeriodicalId\":122925,\"journal\":{\"name\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"volume\":\"200 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDBM.2007.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2007.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

DNA微阵列技术为遗传学家提供了一种同时监测成千上万个基因之间相互作用的新方法,并已成为基因发现、疾病诊断和药物设计的标准实验室程序。在一致的实验设置下,对基因表达的相干子空间聚类进行了广泛的研究。这意味着所有实验都是使用具有相似噪声特性的同一批微阵列芯片进行的。在这种假设下开发的算法可能不适用于分析从异质环境中收集的数据,在异质环境中,被监测的一组基因可能不同,即使对于同一基因,表达水平也可能无法直接比较。本文提出了一种从异质基因表达数据中挖掘子空间相干模式的f -聚类模型,该模型可以有效地揭示真实模式并减少虚假模式。我们还开发了一种高效且可扩展的混合方法,该方法结合了基于基因对和基于样本对的修剪,同时从多个基因表达矩阵中生成f簇。实验结果表明,我们的模型可以发现以前的模型无法识别的重要聚类。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An Efficient Algorithm for Mining Coherent Patterns from Heterogeneous Microarrays
DNA microarray techniques present a novel way for geneticists to monitor interactions among tens of thousands of genes simultaneously, and have become standard lab routines in gene discovery, disease diagnosis, and drug design. There has been extensive research on coherent subspace clustering of gene expressions measured under consistent experimental settings. This implies that all experiments are run using the same batch of microarray chips with similar characteristics of noise. Algorithms developed under this assumption may not be applicable for analyzing data collected from heterogeneous settings, where the set of genes being monitored may be different and expression levels may be not directly comparable even for the same gene. In this paper, we propose a model, F-cluster, for mining subspace coherent patterns from heterogeneous gene expression data, which is shown effective for revealing truthful patterns and reducing spurious ones. We also develop an efficient and scalable hybrid approach that combines gene-pair based and sample-pair based pruning to generate F-clusters from multiple gene expression matrices simultaneously. The experimental results demonstrate that our model can discover significant clusters that may not be identified by previous models.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信