{"title":"基于基因选择的NMF、谱聚类和k均值聚类性能统计分析","authors":"Andri Mirzal","doi":"10.1109/ICCIS49240.2020.9257702","DOIUrl":null,"url":null,"abstract":"The using of statistical test to determine significances of performance differences between clustering algorithms is not yet common even until recently. This is an important task because the test can determine whether one algorithm is statistically better than the other one. Moreover, using statistical test to determine significances of performance gains/losses after applying some processing steps to datasets such as feature selection is even much less common. The first task has been addressed in our other work [1], and the second task is the topic of this paper. In this study, nonnegative matrix factorization (NMF), spectral clustering, and k-means are utilized as clustering methods; LS (Laplacian Score), SPEC (SPECtral), and SPFS (Similarity Preserving Feature Selection) are utilized as feature selection mechanisms; and eleven microarray gene expression datasets are used to evaluate performances of the clustering methods. The experimental results show that in average only LS can significantly improve performances of the clustering methods statistically, SPEC seems to offer no advantage, and SPFS instead lowers clustering performances. As it is expensive to apply selection mechanisms, these results raise a question whether it is worth to utilize them for selecting genes in microarray datasets.","PeriodicalId":425637,"journal":{"name":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","volume":"20 6","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Statistical Analysis of Clustering Performances of NMF, Spectral Clustering, and K-means: With Gene Selection\",\"authors\":\"Andri Mirzal\",\"doi\":\"10.1109/ICCIS49240.2020.9257702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The using of statistical test to determine significances of performance differences between clustering algorithms is not yet common even until recently. This is an important task because the test can determine whether one algorithm is statistically better than the other one. Moreover, using statistical test to determine significances of performance gains/losses after applying some processing steps to datasets such as feature selection is even much less common. The first task has been addressed in our other work [1], and the second task is the topic of this paper. In this study, nonnegative matrix factorization (NMF), spectral clustering, and k-means are utilized as clustering methods; LS (Laplacian Score), SPEC (SPECtral), and SPFS (Similarity Preserving Feature Selection) are utilized as feature selection mechanisms; and eleven microarray gene expression datasets are used to evaluate performances of the clustering methods. The experimental results show that in average only LS can significantly improve performances of the clustering methods statistically, SPEC seems to offer no advantage, and SPFS instead lowers clustering performances. As it is expensive to apply selection mechanisms, these results raise a question whether it is worth to utilize them for selecting genes in microarray datasets.\",\"PeriodicalId\":425637,\"journal\":{\"name\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"volume\":\"20 6\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 2nd International Conference on Computer and Information Sciences (ICCIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCIS49240.2020.9257702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 2nd International Conference on Computer and Information Sciences (ICCIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCIS49240.2020.9257702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Statistical Analysis of Clustering Performances of NMF, Spectral Clustering, and K-means: With Gene Selection
The using of statistical test to determine significances of performance differences between clustering algorithms is not yet common even until recently. This is an important task because the test can determine whether one algorithm is statistically better than the other one. Moreover, using statistical test to determine significances of performance gains/losses after applying some processing steps to datasets such as feature selection is even much less common. The first task has been addressed in our other work [1], and the second task is the topic of this paper. In this study, nonnegative matrix factorization (NMF), spectral clustering, and k-means are utilized as clustering methods; LS (Laplacian Score), SPEC (SPECtral), and SPFS (Similarity Preserving Feature Selection) are utilized as feature selection mechanisms; and eleven microarray gene expression datasets are used to evaluate performances of the clustering methods. The experimental results show that in average only LS can significantly improve performances of the clustering methods statistically, SPEC seems to offer no advantage, and SPFS instead lowers clustering performances. As it is expensive to apply selection mechanisms, these results raise a question whether it is worth to utilize them for selecting genes in microarray datasets.