Matheus Campos Fernandes, T. Covões, André Luiz Vizine Pereira
{"title":"Active Learning for Evolutionary Constrained Clustering","authors":"Matheus Campos Fernandes, T. Covões, André Luiz Vizine Pereira","doi":"10.1109/BRACIS.2019.00037","DOIUrl":null,"url":null,"abstract":"The high cost of labeling data for analysis has increased interest in semi-supervised learning. One of its most common types is constrained clustering, which is a type of learning that does not rely on class labels for a group of objects. Instead, there is only information if some pairs of objects must be in the same cluster or in different clusters. In some applications, identifying such constraints involves reduced cost since it is less information than a class label. At the same time, Active Learning (AL) aims to minimize the cost of creating labeled datasets, trying to identify which unlabeled data are more relevant for using during the learning process, considering the labels that are already available. This paper proposes three AL strategies to an evolutionary constrained clustering algorithm (FIECE-EM) based on Gaussian Mixture Models (GMM). Experiments were executed on 10 well-known datasets, as a way to measure the impacts of each strategy. We compare the results with baseline supervised algorithms as well as COBRAS, a state-of-the-art Active Learning algorithm for constrained clustering. Two of the proposed strategies obtained significantly better results than COBRAS in our empirical evaluation. Thus, the combination of FIECE-EM with these strategies can be considered viable alternatives for AL in a constrained clustering setting.","PeriodicalId":335206,"journal":{"name":"Brazilian Conference on Intelligent Systems","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Brazilian Conference on Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2019.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The high cost of labeling data for analysis has increased interest in semi-supervised learning. One of its most common types is constrained clustering, which is a type of learning that does not rely on class labels for a group of objects. Instead, there is only information if some pairs of objects must be in the same cluster or in different clusters. In some applications, identifying such constraints involves reduced cost since it is less information than a class label. At the same time, Active Learning (AL) aims to minimize the cost of creating labeled datasets, trying to identify which unlabeled data are more relevant for using during the learning process, considering the labels that are already available. This paper proposes three AL strategies to an evolutionary constrained clustering algorithm (FIECE-EM) based on Gaussian Mixture Models (GMM). Experiments were executed on 10 well-known datasets, as a way to measure the impacts of each strategy. We compare the results with baseline supervised algorithms as well as COBRAS, a state-of-the-art Active Learning algorithm for constrained clustering. Two of the proposed strategies obtained significantly better results than COBRAS in our empirical evaluation. Thus, the combination of FIECE-EM with these strategies can be considered viable alternatives for AL in a constrained clustering setting.