{"title":"微阵列数据聚类分析降维技术的比较研究","authors":"D. Araújo, A. Neto, A. Martins, J. Melo","doi":"10.1109/IJCNN.2011.6033447","DOIUrl":null,"url":null,"abstract":"This paper proposes a study on the impact of the use of dimension reduction techniques (DRTs) in the quality of partitions produced by cluster analysis of microarray datasets. We tested seven DRTs applied to four microarray cancer datasets and ran four clustering algorithms using the original and reduced datasets. Overall results showed that using DRTs provides a improvement in performance of all algorithms tested, specially in the hierarchical class. We could see that, despite Principal Component Analysis (PCA) being the most widely used DRT, its was overcome by other nonlinear methods and it did not provide a substantial performance increase in the clustering algorithms. On the other hand, t-distributed Stochastic Embedding (t-SNE) and Laplacian Eigenmaps (LE) achieved good results for all datasets.","PeriodicalId":415833,"journal":{"name":"The 2011 International Joint Conference on Neural Networks","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Comparative study on dimension reduction techniques for cluster analysis of microarray data\",\"authors\":\"D. Araújo, A. Neto, A. Martins, J. Melo\",\"doi\":\"10.1109/IJCNN.2011.6033447\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper proposes a study on the impact of the use of dimension reduction techniques (DRTs) in the quality of partitions produced by cluster analysis of microarray datasets. We tested seven DRTs applied to four microarray cancer datasets and ran four clustering algorithms using the original and reduced datasets. Overall results showed that using DRTs provides a improvement in performance of all algorithms tested, specially in the hierarchical class. We could see that, despite Principal Component Analysis (PCA) being the most widely used DRT, its was overcome by other nonlinear methods and it did not provide a substantial performance increase in the clustering algorithms. On the other hand, t-distributed Stochastic Embedding (t-SNE) and Laplacian Eigenmaps (LE) achieved good results for all datasets.\",\"PeriodicalId\":415833,\"journal\":{\"name\":\"The 2011 International Joint Conference on Neural Networks\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2011 International Joint Conference on Neural Networks\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IJCNN.2011.6033447\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2011 International Joint Conference on Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IJCNN.2011.6033447","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative study on dimension reduction techniques for cluster analysis of microarray data
This paper proposes a study on the impact of the use of dimension reduction techniques (DRTs) in the quality of partitions produced by cluster analysis of microarray datasets. We tested seven DRTs applied to four microarray cancer datasets and ran four clustering algorithms using the original and reduced datasets. Overall results showed that using DRTs provides a improvement in performance of all algorithms tested, specially in the hierarchical class. We could see that, despite Principal Component Analysis (PCA) being the most widely used DRT, its was overcome by other nonlinear methods and it did not provide a substantial performance increase in the clustering algorithms. On the other hand, t-distributed Stochastic Embedding (t-SNE) and Laplacian Eigenmaps (LE) achieved good results for all datasets.