{"title":"Simultaneous clustering of multiple heterogeneous gene expression datasets","authors":"Basel Abu-Jamous, S. Kelly","doi":"10.1109/AEECT.2017.8257763","DOIUrl":null,"url":null,"abstract":"Clustering algorithms aim, by definition, at partitioning a given set of objects into a set of clusters such that those objects which belong to the same cluster are similar to each other while being dissimilar to the objects belonging to the other clusters. By application to three case studies of real gene expression data, we demonstrate that the most commonly used algorithms (e.g. k-means and Markov clustering) do not always meet the objective of clustering as per the definition of clustering. This problem becomes more significant when data with more dimensions are analysed, or when multiple datasets are analysed simultaneously. We solve this problem by proposing an automated consensus clustering algorithm, Clust, which can be applied to one or more datasets simultaneously, and can identify clusters with higher within-cluster similarity and lower intra-cluster similarity than other algorithms. Thus Clust meets the basic definition of clustering in a more reliable and accurate manner.","PeriodicalId":286127,"journal":{"name":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","volume":"29 3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AEECT.2017.8257763","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Clustering algorithms aim, by definition, at partitioning a given set of objects into a set of clusters such that those objects which belong to the same cluster are similar to each other while being dissimilar to the objects belonging to the other clusters. By application to three case studies of real gene expression data, we demonstrate that the most commonly used algorithms (e.g. k-means and Markov clustering) do not always meet the objective of clustering as per the definition of clustering. This problem becomes more significant when data with more dimensions are analysed, or when multiple datasets are analysed simultaneously. We solve this problem by proposing an automated consensus clustering algorithm, Clust, which can be applied to one or more datasets simultaneously, and can identify clusters with higher within-cluster similarity and lower intra-cluster similarity than other algorithms. Thus Clust meets the basic definition of clustering in a more reliable and accurate manner.