{"title":"A Divide-and-Conquer Clustering Approach Based on Optimum-Path Forest","authors":"Adan Echemendia Montero, A. Falcão","doi":"10.1109/SIBGRAPI.2018.00060","DOIUrl":null,"url":null,"abstract":"Data clustering is one of the main challenges when solving Data Science problems. Despite its progress over almost one century of research, clustering algorithms still fail in identifying groups naturally related to the semantics of the problem. Moreover, the technological advances add crucial challenges with a considerable data increase, which are not handled by most techniques. We address these issues by proposing a divide-and-conquer approach to a clustering technique, which is unique in finding one group per dome of the probability density function of the data — the Optimum-Path Forest (OPF) clustering algorithm. Our approach can use all samples, or at least many samples, in the unsupervised learning process without affecting the grouping performance and, therefore, being less likely to lose relevant grouping information. We show that it can obtain satisfactory results when segmenting natural images into superpixels.","PeriodicalId":208985,"journal":{"name":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 31st SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIBGRAPI.2018.00060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Data clustering is one of the main challenges when solving Data Science problems. Despite its progress over almost one century of research, clustering algorithms still fail in identifying groups naturally related to the semantics of the problem. Moreover, the technological advances add crucial challenges with a considerable data increase, which are not handled by most techniques. We address these issues by proposing a divide-and-conquer approach to a clustering technique, which is unique in finding one group per dome of the probability density function of the data — the Optimum-Path Forest (OPF) clustering algorithm. Our approach can use all samples, or at least many samples, in the unsupervised learning process without affecting the grouping performance and, therefore, being less likely to lose relevant grouping information. We show that it can obtain satisfactory results when segmenting natural images into superpixels.