Khadija Musayeva, Tristan Henderson, John Bo Mitchell, Lazaros Mavridis
{"title":"PFClust: an optimised implementation of a parameter-free clustering algorithm.","authors":"Khadija Musayeva, Tristan Henderson, John Bo Mitchell, Lazaros Mavridis","doi":"10.1186/1751-0473-9-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.</p><p><strong>Results: </strong>The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast.</p><p><strong>Conclusions: </strong>In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.</p>","PeriodicalId":35052,"journal":{"name":"Source Code for Biology and Medicine","volume":"9 1","pages":"5"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1186/1751-0473-9-5","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Source Code for Biology and Medicine","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/1751-0473-9-5","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Decision Sciences","Score":null,"Total":0}
引用次数: 8
Abstract
Background: A well-known problem in cluster analysis is finding an optimal number of clusters reflecting the inherent structure of the data. PFClust is a partitioning-based clustering algorithm capable, unlike many widely-used clustering algorithms, of automatically proposing an optimal number of clusters for the data.
Results: The results of tests on various types of data showed that PFClust can discover clusters of arbitrary shapes, sizes and densities. The previous implementation of the algorithm had already been successfully used to cluster large macromolecular structures and small druglike compounds. We have greatly improved the algorithm by a more efficient implementation, which enables PFClust to process large data sets acceptably fast.
Conclusions: In this paper we present a new optimized implementation of the PFClust algorithm that runs considerably faster than the original.
期刊介绍:
Source Code for Biology and Medicine is a peer-reviewed open access, online journal that publishes articles on source code employed over a wide range of applications in biology and medicine. The journal"s aim is to publish source code for distribution and use in the public domain in order to advance biological and medical research. Through this dissemination, it may be possible to shorten the time required for solving certain computational problems for which there is limited source code availability or resources.