Unsupervised discovery of fuzzy patterns in gene expression data

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2010-12-01 DOI:10.1109/BIBM.2010.5706575

Gene P. K. Wu, Keith C. C. Chan, A. Wong, Bin Wu

{"title":"Unsupervised discovery of fuzzy patterns in gene expression data","authors":"Gene P. K. Wu, Keith C. C. Chan, A. Wong, Bin Wu","doi":"10.1109/BIBM.2010.5706575","DOIUrl":null,"url":null,"abstract":"Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. This paper attempts to tackle this important problem. For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns can be discovered. If the gene groups obtained are crisp clusters, significant patterns overlapping different clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp attribute clusters for that purpose. To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic dataset and then a gene expression dataset with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2010.5706575","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Discovering patterns from gene expression levels is regarded as a classification problem when tissue classes of the samples are given and solved as a discrete-data problem by discretizing the expression levels of each gene into intervals maximizing the interdependence between that gene and the class labels. However, when class information is unavailable, discovering gene expression patterns becomes difficult. This paper attempts to tackle this important problem. For a gene pool with large number of genes, we first cluster the genes into smaller groups. In each group, we use the representative gene, one with highest interdependence with others in the group, to drive the discretization of the gene expression levels of other genes. Treating intervals as discrete events, association patterns can be discovered. If the gene groups obtained are crisp clusters, significant patterns overlapping different clusters cannot be found. This paper presents a new method of “fuzzifying” the crisp attribute clusters for that purpose. To evaluate the effectiveness of our approach, we first apply the above described procedure on a synthetic dataset and then a gene expression dataset with known class labels. The class labels are not being used in both analyses but used later as the ground truth in a classificatory problem for assessing the algorithm's effectiveness in fuzzy gene clustering and discretization. The results show the efficacy of the proposed method.

查看原文本刊更多论文

基因表达数据中模糊模式的无监督发现

当给出样本的组织类别时，从基因表达水平发现模式被视为一个分类问题，并通过将每个基因的表达水平离散到最大限度地提高该基因与类别标签之间的相互依赖性的间隔来解决作为一个离散数据问题。然而，当班级信息不可用时，发现基因表达模式变得困难。本文试图解决这一重要问题。对于拥有大量基因的基因库，我们首先将基因聚类成较小的组。在每一组中，我们使用具有代表性的基因，即与组中其他基因相互依赖性最高的基因，来驱动其他基因的基因表达水平的离散化。将间隔视为离散事件，可以发现关联模式。如果获得的基因群是脆簇，则无法找到不同簇重叠的显著模式。为此，本文提出了一种对清晰属性簇进行“模糊化”的新方法。为了评估我们方法的有效性，我们首先将上述过程应用于合成数据集，然后应用于具有已知类标签的基因表达数据集。在两种分析中都没有使用类标签，但在分类问题中用作评估算法在模糊基因聚类和离散化中的有效性的基础真理。实验结果表明了该方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量