{"title":"一种从异构微阵列中挖掘相干模式的有效算法","authors":"Xiang Zhang, Wei Wang","doi":"10.1109/SSDBM.2007.30","DOIUrl":null,"url":null,"abstract":"DNA microarray techniques present a novel way for geneticists to monitor interactions among tens of thousands of genes simultaneously, and have become standard lab routines in gene discovery, disease diagnosis, and drug design. There has been extensive research on coherent subspace clustering of gene expressions measured under consistent experimental settings. This implies that all experiments are run using the same batch of microarray chips with similar characteristics of noise. Algorithms developed under this assumption may not be applicable for analyzing data collected from heterogeneous settings, where the set of genes being monitored may be different and expression levels may be not directly comparable even for the same gene. In this paper, we propose a model, F-cluster, for mining subspace coherent patterns from heterogeneous gene expression data, which is shown effective for revealing truthful patterns and reducing spurious ones. We also develop an efficient and scalable hybrid approach that combines gene-pair based and sample-pair based pruning to generate F-clusters from multiple gene expression matrices simultaneously. The experimental results demonstrate that our model can discover significant clusters that may not be identified by previous models.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"200 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"An Efficient Algorithm for Mining Coherent Patterns from Heterogeneous Microarrays\",\"authors\":\"Xiang Zhang, Wei Wang\",\"doi\":\"10.1109/SSDBM.2007.30\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DNA microarray techniques present a novel way for geneticists to monitor interactions among tens of thousands of genes simultaneously, and have become standard lab routines in gene discovery, disease diagnosis, and drug design. There has been extensive research on coherent subspace clustering of gene expressions measured under consistent experimental settings. This implies that all experiments are run using the same batch of microarray chips with similar characteristics of noise. Algorithms developed under this assumption may not be applicable for analyzing data collected from heterogeneous settings, where the set of genes being monitored may be different and expression levels may be not directly comparable even for the same gene. In this paper, we propose a model, F-cluster, for mining subspace coherent patterns from heterogeneous gene expression data, which is shown effective for revealing truthful patterns and reducing spurious ones. We also develop an efficient and scalable hybrid approach that combines gene-pair based and sample-pair based pruning to generate F-clusters from multiple gene expression matrices simultaneously. The experimental results demonstrate that our model can discover significant clusters that may not be identified by previous models.\",\"PeriodicalId\":122925,\"journal\":{\"name\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"volume\":\"200 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-07-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDBM.2007.30\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2007.30","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
An Efficient Algorithm for Mining Coherent Patterns from Heterogeneous Microarrays
DNA microarray techniques present a novel way for geneticists to monitor interactions among tens of thousands of genes simultaneously, and have become standard lab routines in gene discovery, disease diagnosis, and drug design. There has been extensive research on coherent subspace clustering of gene expressions measured under consistent experimental settings. This implies that all experiments are run using the same batch of microarray chips with similar characteristics of noise. Algorithms developed under this assumption may not be applicable for analyzing data collected from heterogeneous settings, where the set of genes being monitored may be different and expression levels may be not directly comparable even for the same gene. In this paper, we propose a model, F-cluster, for mining subspace coherent patterns from heterogeneous gene expression data, which is shown effective for revealing truthful patterns and reducing spurious ones. We also develop an efficient and scalable hybrid approach that combines gene-pair based and sample-pair based pruning to generate F-clusters from multiple gene expression matrices simultaneously. The experimental results demonstrate that our model can discover significant clusters that may not be identified by previous models.