Enhanced biclustering on expression data

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings. Pub Date : 2003-03-10 DOI:10.1109/BIBE.2003.1188969

Jiong Yang, Haixun Wang, Wei Wang, Philip S. Yu

{"title":"Enhanced biclustering on expression data","authors":"Jiong Yang, Haixun Wang, Wei Wang, Philip S. Yu","doi":"10.1109/BIBE.2003.1188969","DOIUrl":null,"url":null,"abstract":"Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters. To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.","PeriodicalId":178814,"journal":{"name":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"342","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBE.2003.1188969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 342

Abstract

Microarrays are one of the latest breakthroughs in experimental molecular biology, which provide a powerful tool by which the expression patterns of thousands of genes can be monitored simultaneously and are already producing huge amount of valuable data. The concept of bicluster was introduced by Cheng and Church (2000) to capture the coherence of a subset of genes and a subset of conditions. A set of heuristic algorithms were also designed to either find one bicluster or a set of biclusters, which consist of iterations of masking null values and discovered biclusters, coarse and fine node deletion, node addition, and the inclusion of inverted data. These heuristics inevitably suffer from some serious drawback. The masking of null values and discovered biclusters with random numbers may result in the phenomenon of random interference which in turn impacts the discovery of high quality biclusters. To address this issue and to further accelerate the biclustering process, we generalize the model of bicluster to incorporate null values and propose a probabilistic algorithm (FLOC) that can discover a set of k possibly overlapping biclusters simultaneously. Furthermore, this algorithm can easily be extended to support additional features that suit different requirements at virtually little cost. Experimental study on the yeast gene expression data shows that the FLOC algorithm can offer substantial improvements over the previously proposed algorithm.

查看原文本刊更多论文

增强表达数据的双聚类

微阵列是实验分子生物学的最新突破之一，它提供了一个强大的工具，通过它可以同时监测数千个基因的表达模式，并且已经产生了大量有价值的数据。Cheng和Church(2000)引入了双聚类的概念，以捕捉基因子集和条件子集的一致性。设计了一套启发式算法，用于寻找一个或一组双聚类，该算法由屏蔽空值和发现的双聚类的迭代、粗节点和细节点的删除、节点的添加和反向数据的包含组成。这些启发式不可避免地存在一些严重的缺陷。用随机数掩盖空值和发现的双聚类可能会导致随机干扰现象，从而影响高质量双聚类的发现。为了解决这个问题并进一步加速双聚类过程，我们将双聚类模型推广到包含空值，并提出了一种可以同时发现k个可能重叠的双聚类的概率算法(FLOC)。此外，该算法可以很容易地扩展，以支持额外的功能，以满足不同的需求，几乎很少的成本。酵母基因表达数据的实验研究表明，FLOC算法比之前提出的算法有很大的改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Third IEEE Symposium on Bioinformatics and Bioengineering, 2003. Proceedings.

自引率

0.00%

发文量