{"title":"tpe - autocluster:用于自动聚类的基于树的管道集成框架","authors":"Radwa El Shawi, S. Sakr","doi":"10.1109/ICDMW58026.2022.00149","DOIUrl":null,"url":null,"abstract":"Novel technologies in automated machine learning ease the complexity of building well-performed machine learning pipelines. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsu-pervised learning, particularly clustering, remains a largely un-explored problem due to the ambiguity involved when evaluating the clustering solutions. Motivated by this shortcoming, in this paper, we introduce TPE-AutoClust, a genetic programming-based automated machine learning framework for clustering. TPE-AutoCl ust optimizes a series of feature preprocessors and machine learning models to optimize the performance on an unsupervised clustering task. TPE-AutoClust mainly consists of three main phases: meta-learning phase, optimization phase and clustering ensemble construction phase. The meta-learning phase suggests some instantiations of pipelines that are likely to perform well on a new dataset. These pipelines are used to warmstart the optimization phase that adopts a multi-objective optimization technique to select pipelines based on the Pareto front of the trade-off between the pipeline length and performance. The ensemble construction phase develops a collaborative mechanism based on a clustering ensemble to combine optimized pipelines based on different internal cluster validity indices and construct a well-performing solution for a new dataset. The proposed framework is based on scikit-learn with 4 preprocessors and 6 clustering algorithms. Extensive experiments are conducted on 27 real and synthetic benchmark datasets to validate the superiority of TPE-AutoCl ust. The results show that TPE-AutoClust outperforms the state-of-the-art techniques for building automated clustering solutions.","PeriodicalId":146687,"journal":{"name":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering\",\"authors\":\"Radwa El Shawi, S. Sakr\",\"doi\":\"10.1109/ICDMW58026.2022.00149\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Novel technologies in automated machine learning ease the complexity of building well-performed machine learning pipelines. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsu-pervised learning, particularly clustering, remains a largely un-explored problem due to the ambiguity involved when evaluating the clustering solutions. Motivated by this shortcoming, in this paper, we introduce TPE-AutoClust, a genetic programming-based automated machine learning framework for clustering. TPE-AutoCl ust optimizes a series of feature preprocessors and machine learning models to optimize the performance on an unsupervised clustering task. TPE-AutoClust mainly consists of three main phases: meta-learning phase, optimization phase and clustering ensemble construction phase. The meta-learning phase suggests some instantiations of pipelines that are likely to perform well on a new dataset. These pipelines are used to warmstart the optimization phase that adopts a multi-objective optimization technique to select pipelines based on the Pareto front of the trade-off between the pipeline length and performance. The ensemble construction phase develops a collaborative mechanism based on a clustering ensemble to combine optimized pipelines based on different internal cluster validity indices and construct a well-performing solution for a new dataset. The proposed framework is based on scikit-learn with 4 preprocessors and 6 clustering algorithms. Extensive experiments are conducted on 27 real and synthetic benchmark datasets to validate the superiority of TPE-AutoCl ust. The results show that TPE-AutoClust outperforms the state-of-the-art techniques for building automated clustering solutions.\",\"PeriodicalId\":146687,\"journal\":{\"name\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Data Mining Workshops (ICDMW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW58026.2022.00149\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Data Mining Workshops (ICDMW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW58026.2022.00149","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
TPE-AutoClust: A Tree-based Pipline Ensemble Framework for Automated Clustering
Novel technologies in automated machine learning ease the complexity of building well-performed machine learning pipelines. However, these are usually restricted to supervised learning tasks such as classification and regression, while unsu-pervised learning, particularly clustering, remains a largely un-explored problem due to the ambiguity involved when evaluating the clustering solutions. Motivated by this shortcoming, in this paper, we introduce TPE-AutoClust, a genetic programming-based automated machine learning framework for clustering. TPE-AutoCl ust optimizes a series of feature preprocessors and machine learning models to optimize the performance on an unsupervised clustering task. TPE-AutoClust mainly consists of three main phases: meta-learning phase, optimization phase and clustering ensemble construction phase. The meta-learning phase suggests some instantiations of pipelines that are likely to perform well on a new dataset. These pipelines are used to warmstart the optimization phase that adopts a multi-objective optimization technique to select pipelines based on the Pareto front of the trade-off between the pipeline length and performance. The ensemble construction phase develops a collaborative mechanism based on a clustering ensemble to combine optimized pipelines based on different internal cluster validity indices and construct a well-performing solution for a new dataset. The proposed framework is based on scikit-learn with 4 preprocessors and 6 clustering algorithms. Extensive experiments are conducted on 27 real and synthetic benchmark datasets to validate the superiority of TPE-AutoCl ust. The results show that TPE-AutoClust outperforms the state-of-the-art techniques for building automated clustering solutions.