Concepts of relative sample outlier (RSO) and weighted sample similarity (WSS) for improving performance of clustering genes: co-function and co-regulation.
Anindya Bhattacharya, Nirmalya Chowdhury, Rajat K De
{"title":"Concepts of relative sample outlier (RSO) and weighted sample similarity (WSS) for improving performance of clustering genes: co-function and co-regulation.","authors":"Anindya Bhattacharya, Nirmalya Chowdhury, Rajat K De","doi":"10.1504/ijdmb.2015.067322","DOIUrl":null,"url":null,"abstract":"<p><p>Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier (RSO). We formulate new similarity, called Weighted Sample Similarity (WSS), incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.</p>","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067322","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1504/ijdmb.2015.067322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier (RSO). We formulate new similarity, called Weighted Sample Similarity (WSS), incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.