提高聚类基因性能的相对样本离群值(RSO)和加权样本相似性(WSS)概念:协同功能和协同调控。

IF 0.4 4区生物学 Q4 MATHEMATICAL & COMPUTATIONAL BIOLOGY

International Journal of Data Mining and Bioinformatics Pub Date : 2015-01-01 DOI:10.1504/ijdmb.2015.067322

Anindya Bhattacharya, Nirmalya Chowdhury, Rajat K De

{"title":"提高聚类基因性能的相对样本离群值(RSO)和加权样本相似性(WSS)概念:协同功能和协同调控。","authors":"Anindya Bhattacharya, Nirmalya Chowdhury, Rajat K De","doi":"10.1504/ijdmb.2015.067322","DOIUrl":null,"url":null,"abstract":"Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier (RSO). We formulate new similarity, called Weighted Sample Similarity (WSS), incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.","PeriodicalId":54964,"journal":{"name":"International Journal of Data Mining and Bioinformatics","volume":"11 3","pages":"314-30"},"PeriodicalIF":0.4000,"publicationDate":"2015-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067322","citationCount":"0","resultStr":"{\"title\":\"Concepts of relative sample outlier (RSO) and weighted sample similarity (WSS) for improving performance of clustering genes: co-function and co-regulation.\",\"authors\":\"Anindya Bhattacharya, Nirmalya Chowdhury, Rajat K De\",\"doi\":\"10.1504/ijdmb.2015.067322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier (RSO). We formulate new similarity, called Weighted Sample Similarity (WSS), incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.\",\"PeriodicalId\":54964,\"journal\":{\"name\":\"International Journal of Data Mining and Bioinformatics\",\"volume\":\"11 3\",\"pages\":\"314-30\"},\"PeriodicalIF\":0.4000,\"publicationDate\":\"2015-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1504/ijdmb.2015.067322\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Data Mining and Bioinformatics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1504/ijdmb.2015.067322\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"MATHEMATICAL & COMPUTATIONAL BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Data Mining and Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1504/ijdmb.2015.067322","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

聚类算法的性能很大程度上取决于所选择的相似度度量。处理异常值的效率是相似性度量成功的主要因素。在异常值存在的情况下，相似性度量在测量基因之间相似性方面的能力越好，聚类算法在形成生物相关基因群方面的性能就越好。在本文中，我们讨论了用不同的现有相似性度量来处理异常值的问题，并引入了相对样本异常值(RSO)的概念。我们提出了新的相似度，称为加权样本相似度(WSS)，结合欧几里得距离和Pearson相关系数，然后使用它们在各种聚类和双聚类算法中对不同的基因表达谱进行分组。我们的研究结果表明，在寻找生物学上相关的基因群方面，WSS提高了所有考虑的聚类算法的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Concepts of relative sample outlier (RSO) and weighted sample similarity (WSS) for improving performance of clustering genes: co-function and co-regulation.

Performance of clustering algorithms is largely dependent on selected similarity measure. Efficiency in handling outliers is a major contributor to the success of a similarity measure. Better the ability of similarity measure in measuring similarity between genes in the presence of outliers, better will be the performance of the clustering algorithm in forming biologically relevant groups of genes. In the present article, we discuss the problem of handling outliers with different existing similarity measures and introduce the concepts of Relative Sample Outlier (RSO). We formulate new similarity, called Weighted Sample Similarity (WSS), incorporated in Euclidean distance and Pearson correlation coefficient and then use them in various clustering and biclustering algorithms to group different gene expression profiles. Our results suggest that WSS improves performance, in terms of finding biologically relevant groups of genes, of all the considered clustering algorithms.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Data Mining and Bioinformatics 生物-数学与计算生物学

CiteScore

1.00

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Mining bioinformatics data is an emerging area at the intersection between bioinformatics and data mining. The objective of IJDMB is to facilitate collaboration between data mining researchers and bioinformaticians by presenting cutting edge research topics and methodologies in the area of data mining for bioinformatics. This perspective acknowledges the inter-disciplinary nature of research in data mining and bioinformatics and provides a unified forum for researchers/practitioners/students/policy makers to share the latest research and developments in this fast growing multi-disciplinary research area.