{"title":"Performance Analysis of Hard and Soft Clustering Approaches For Gene Expression Data","authors":"P. K. N. Banu, S. Andrews","doi":"10.4018/ijrsda.2015010104","DOIUrl":null,"url":null,"abstract":"Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps SOM based clustering and Genetic Algorithm GA based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index XB, Davies-Bouldin index DB, Mean Absolute Error MAE, Root Mean Squared Error RMSE and Dunn's Index DI along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.","PeriodicalId":152357,"journal":{"name":"Int. J. Rough Sets Data Anal.","volume":"41 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Rough Sets Data Anal.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/ijrsda.2015010104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33
Abstract
Mining gene expression data is growing rapidly to predict gene expression patterns and assist clinicians in early diagnosis of tumor formation. Clustering gene expression data is the most important phase, helps in finding group of genes that are highly expressed and suppressed. This paper analyses the performance of most representative hard and soft off-line clustering algorithms: K-Means, Fuzzy C-Means, Self Organizing Maps SOM based clustering and Genetic Algorithm GA based clustering for brain tumor gene expression dataset. Clusters produced by the clustering algorithms are the indications of the cellular processes. Clustering results are evaluated using clustering indices such as Xie-Beni index XB, Davies-Bouldin index DB, Mean Absolute Error MAE, Root Mean Squared Error RMSE and Dunn's Index DI along with the time taken to find the compactness and separation of clusters. Experimental results prove soft clustering approaches works well to predict clusters of highly expressed and suppressed genes.