使用R*树的微阵列相似性和聚类分析算法

2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05) Pub Date : 2005-08-08 DOI:10.1109/CSBW.2005.125

Jiaxiong Pi, Yong Shi, Zhengxin Chen

{"title":"使用R*树的微阵列相似性和聚类分析算法","authors":"Jiaxiong Pi, Yong Shi, Zhengxin Chen","doi":"10.1109/CSBW.2005.125","DOIUrl":null,"url":null,"abstract":"Similarity and cluster analysis are important aspects for analyzing microarray data. Based on our perspective of viewing microarrays as time series data, both similarity analysis and cluster analysis are carried out through indexing on time series data using R*-Trees. We have developed algorithms for similarity and cluster analysis on microarray data, and conducted experimental studies and comparative studies. First, our study shows that principle components analysis (PCA) has superiority over several other methods (such as DFT and PAA) as far as distance conservation is concerned. A similarity analysis tool based on PCA has been developed, which is able to explore less R*-Tree nodes before finding its similar counterparts and returns less false positives than other methods. In addition, we also extend R*-Tree's application to cluster analysis. With the aid of R*-Tree indexing, two clustering algorithms. KMeans-R and Hierarchy-R, are proposed as an improved version of K-Means and hierarchical clustering, respectively. Experiments for similarity search and cluster analysis based on proposed algorithms have been carried out and have shown favorable results. Experiments related to yeast cell cycle dataset are reported in this paper.","PeriodicalId":123531,"journal":{"name":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Similarity and cluster analysis algorithms for microarrays using R* trees\",\"authors\":\"Jiaxiong Pi, Yong Shi, Zhengxin Chen\",\"doi\":\"10.1109/CSBW.2005.125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Similarity and cluster analysis are important aspects for analyzing microarray data. Based on our perspective of viewing microarrays as time series data, both similarity analysis and cluster analysis are carried out through indexing on time series data using R*-Trees. We have developed algorithms for similarity and cluster analysis on microarray data, and conducted experimental studies and comparative studies. First, our study shows that principle components analysis (PCA) has superiority over several other methods (such as DFT and PAA) as far as distance conservation is concerned. A similarity analysis tool based on PCA has been developed, which is able to explore less R*-Tree nodes before finding its similar counterparts and returns less false positives than other methods. In addition, we also extend R*-Tree's application to cluster analysis. With the aid of R*-Tree indexing, two clustering algorithms. KMeans-R and Hierarchy-R, are proposed as an improved version of K-Means and hierarchical clustering, respectively. Experiments for similarity search and cluster analysis based on proposed algorithms have been carried out and have shown favorable results. Experiments related to yeast cell cycle dataset are reported in this paper.\",\"PeriodicalId\":123531,\"journal\":{\"name\":\"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-08-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSBW.2005.125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSBW.2005.125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

相似性和聚类分析是分析微阵列数据的重要方面。基于我们将微阵列视为时间序列数据的观点，我们使用R*-Trees对时间序列数据进行索引，进行相似性分析和聚类分析。我们开发了对微阵列数据进行相似性和聚类分析的算法，并进行了实验研究和比较研究。首先，我们的研究表明，主成分分析(PCA)在距离守恒方面优于其他几种方法(如DFT和PAA)。开发了一种基于PCA的相似性分析工具，该工具能够在找到相似的对应节点之前探索较少的R*-Tree节点，并且比其他方法返回更少的假阳性。此外，我们还将R*-Tree的应用扩展到聚类分析。借助R*-Tree索引，给出了两种聚类算法。KMeans-R和Hierarchy-R分别是K-Means和分层聚类的改进版本。基于所提出算法的相似度搜索和聚类分析实验已经进行，并取得了良好的效果。本文报道了酵母细胞周期数据集的相关实验。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Similarity and cluster analysis algorithms for microarrays using R* trees

Similarity and cluster analysis are important aspects for analyzing microarray data. Based on our perspective of viewing microarrays as time series data, both similarity analysis and cluster analysis are carried out through indexing on time series data using R*-Trees. We have developed algorithms for similarity and cluster analysis on microarray data, and conducted experimental studies and comparative studies. First, our study shows that principle components analysis (PCA) has superiority over several other methods (such as DFT and PAA) as far as distance conservation is concerned. A similarity analysis tool based on PCA has been developed, which is able to explore less R*-Tree nodes before finding its similar counterparts and returns less false positives than other methods. In addition, we also extend R*-Tree's application to cluster analysis. With the aid of R*-Tree indexing, two clustering algorithms. KMeans-R and Hierarchy-R, are proposed as an improved version of K-Means and hierarchical clustering, respectively. Experiments for similarity search and cluster analysis based on proposed algorithms have been carried out and have shown favorable results. Experiments related to yeast cell cycle dataset are reported in this paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05)

自引率

0.00%

发文量