{"title":"Improving Semi-Supervised Clustering Algorithms with Active Query Selection","authors":"Walid Atwa, M. Emam","doi":"10.25728/ASSA.2019.19.4.659","DOIUrl":null,"url":null,"abstract":"Semi-supervised clustering algorithms use a small amount of supervised data in the form of pairwise constraints to improve the clustering performance. However, most current algorithms are passive in the sense that the pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. In this paper, we address the problem of constraint selection to improve the performance of semi-supervised clustering algorithms. Based on the concepts of Maximum Mean Discrepancy, we select a batch of most informative instances that minimize the difference in distribution between the labeled and unlabeled data. Then, querying these instances with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real-world dataset demonstrate the effectiveness and efficiency of the proposed method.","PeriodicalId":39095,"journal":{"name":"Advances in Systems Science and Applications","volume":"19 1","pages":"25-44"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Systems Science and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.25728/ASSA.2019.19.4.659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Engineering","Score":null,"Total":0}
引用次数: 1
Abstract
Semi-supervised clustering algorithms use a small amount of supervised data in the form of pairwise constraints to improve the clustering performance. However, most current algorithms are passive in the sense that the pairwise constraints are provided beforehand and selected randomly. This may lead to the use of constraints that are redundant, unnecessary, or even harmful to the clustering results. In this paper, we address the problem of constraint selection to improve the performance of semi-supervised clustering algorithms. Based on the concepts of Maximum Mean Discrepancy, we select a batch of most informative instances that minimize the difference in distribution between the labeled and unlabeled data. Then, querying these instances with the existing neighborhoods to determine which neighborhood they belong. The experimental results with state-of-the-art methods on different real-world dataset demonstrate the effectiveness and efficiency of the proposed method.
期刊介绍:
Advances in Systems Science and Applications (ASSA) is an international peer-reviewed open-source online academic journal. Its scope covers all major aspects of systems (and processes) analysis, modeling, simulation, and control, ranging from theoretical and methodological developments to a large variety of application areas. Survey articles and innovative results are also welcome. ASSA is aimed at the audience of scientists, engineers and researchers working in the framework of these problems. ASSA should be a platform on which researchers will be able to communicate and discuss both their specialized issues and interdisciplinary problems of systems analysis and its applications in science and industry, including data science, artificial intelligence, material science, manufacturing, transportation, power and energy, ecology, corporate management, public governance, finance, and many others.