Significance analysis of clustering high throughput biological data

2005 IEEE International Conference on Electro Information Technology Pub Date : 2005-05-22 DOI:10.1109/EIT.2005.1627001

H. Otu, Shakirahmed Kolia, Jon Jones, Osman Osman, T. Libermann, Beth Israel

{"title":"Significance analysis of clustering high throughput biological data","authors":"H. Otu, Shakirahmed Kolia, Jon Jones, Osman Osman, T. Libermann, Beth Israel","doi":"10.1109/EIT.2005.1627001","DOIUrl":null,"url":null,"abstract":"In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification","PeriodicalId":358002,"journal":{"name":"2005 IEEE International Conference on Electro Information Technology","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE International Conference on Electro Information Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/EIT.2005.1627001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In the post-genomic era, the availability of complete genome sequences has given rise to high throughput systems such as gene chips and protein arrays. These techniques revolutionize our understanding of biology by simultaneously probing thousands of biological entities at any given time. Unsupervised classification and clustering have emerged as important methods of analysis, which can be used to group samples with a similar molecular profile and/or molecules with a similar expression profile. However, techniques like hierarchical clustering, k-means, and self organizing maps (SOM) have been extensively used with little attention to the significance of their results. We propose a general method utilizing bootstrap technique to assign confidence levels to clustering results of high throughput biological data. We apply the proposed method to real genomics and proteomics data regarding Renal Cell Cancer (RCC), which is the most common malignancy of the adult kidney. We utilize protein profiles from IL-2 treatment responders and non-responders among metastatic RCC patients using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI TOF-MS). We also use gene expression data using Affymetrix HG-U133A chips for primary RCC tumors, inquiring the Union International Contre le Cancer's (UICC) TNM classification

查看原文本刊更多论文

高通量生物数据聚类的显著性分析

在后基因组时代，全基因组序列的可用性已经引起了高通量系统，如基因芯片和蛋白质阵列。这些技术通过在任何给定时间同时探测数千个生物实体，彻底改变了我们对生物学的理解。无监督分类和聚类已成为重要的分析方法，可用于对具有相似分子特征和/或具有相似表达特征的分子进行分组。然而，像分层聚类、k-means和自组织映射(SOM)这样的技术已经被广泛使用，但很少关注其结果的重要性。我们提出了一种利用自举技术为高通量生物数据的聚类结果分配置信水平的通用方法。我们将提出的方法应用于关于肾细胞癌(RCC)的真实基因组学和蛋白质组学数据，这是成人肾脏最常见的恶性肿瘤。我们利用表面增强激光解吸/电离飞行时间质谱(SELDI TOF-MS)分析了转移性RCC患者中IL-2治疗应答者和无应答者的蛋白质谱。我们还使用Affymetrix HG-U133A芯片对原发性RCC肿瘤进行基因表达数据分析，查询国际癌症联盟(UICC)的TNM分类

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2005 IEEE International Conference on Electro Information Technology

自引率

0.00%

发文量