S. Khedairia, Imene Houari, Manel Ababsia, Tarek Khadir
{"title":"基于增量学习的聚类集成方法","authors":"S. Khedairia, Imene Houari, Manel Ababsia, Tarek Khadir","doi":"10.1145/3361570.3361603","DOIUrl":null,"url":null,"abstract":"The clustering ensemble aims to combine multiple clustering results into a probably better and more robust consensus clustering. This technique has shown its efficiency in finding bizarre clusters, dealing with noise, and integrating clustering solutions from multiple distributed sources. Consensus clustering methods based on voting mechanism are widely used in literature. The idea behind majority voting is that the judgement of a group is superior to those of individuals. However, Voting-based consensus methods suffer from the problem of assigning the appropriate cluster label to data objects without majority vote. To deal with this ambiguity as well as clustering when datasets are too large or when new information can arrive dynamically at any time, we have proposed a new clustering approach based on two stage clustering technique where in the first stage a clustering ensemble method based on relabeling and voting process is used to cluster the data objects. Therefore, a new set of disjoint sub-clusters is generated based on majority vote, where each data object vote for the cluster in which it belongs and for its corresponding cluster in each other clustering results. data objects without majority vote are collected in new dataset. In the second stage, the new database as well as the set of previously obtained sub-clusters are processed using an incremental clustering algorithm. The used incremental clustering algorithm is initialized using the obtained sub-clusters and operate on the new dataset elements. The main advantage of incremental clustering methods is that the system can updates its assumptions based on recently available learning data without re-examining old data. The proposed approach have been evaluated using different datasets, where the experimental results have demonstrated the effectiveness and robustness of the proposed method.","PeriodicalId":414028,"journal":{"name":"Proceedings of the 9th International Conference on Information Systems and Technologies","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Clustering Ensemble Approach Based on Incremental Learning\",\"authors\":\"S. Khedairia, Imene Houari, Manel Ababsia, Tarek Khadir\",\"doi\":\"10.1145/3361570.3361603\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The clustering ensemble aims to combine multiple clustering results into a probably better and more robust consensus clustering. This technique has shown its efficiency in finding bizarre clusters, dealing with noise, and integrating clustering solutions from multiple distributed sources. Consensus clustering methods based on voting mechanism are widely used in literature. The idea behind majority voting is that the judgement of a group is superior to those of individuals. However, Voting-based consensus methods suffer from the problem of assigning the appropriate cluster label to data objects without majority vote. To deal with this ambiguity as well as clustering when datasets are too large or when new information can arrive dynamically at any time, we have proposed a new clustering approach based on two stage clustering technique where in the first stage a clustering ensemble method based on relabeling and voting process is used to cluster the data objects. Therefore, a new set of disjoint sub-clusters is generated based on majority vote, where each data object vote for the cluster in which it belongs and for its corresponding cluster in each other clustering results. data objects without majority vote are collected in new dataset. In the second stage, the new database as well as the set of previously obtained sub-clusters are processed using an incremental clustering algorithm. The used incremental clustering algorithm is initialized using the obtained sub-clusters and operate on the new dataset elements. The main advantage of incremental clustering methods is that the system can updates its assumptions based on recently available learning data without re-examining old data. The proposed approach have been evaluated using different datasets, where the experimental results have demonstrated the effectiveness and robustness of the proposed method.\",\"PeriodicalId\":414028,\"journal\":{\"name\":\"Proceedings of the 9th International Conference on Information Systems and Technologies\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 9th International Conference on Information Systems and Technologies\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3361570.3361603\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 9th International Conference on Information Systems and Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3361570.3361603","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Clustering Ensemble Approach Based on Incremental Learning
The clustering ensemble aims to combine multiple clustering results into a probably better and more robust consensus clustering. This technique has shown its efficiency in finding bizarre clusters, dealing with noise, and integrating clustering solutions from multiple distributed sources. Consensus clustering methods based on voting mechanism are widely used in literature. The idea behind majority voting is that the judgement of a group is superior to those of individuals. However, Voting-based consensus methods suffer from the problem of assigning the appropriate cluster label to data objects without majority vote. To deal with this ambiguity as well as clustering when datasets are too large or when new information can arrive dynamically at any time, we have proposed a new clustering approach based on two stage clustering technique where in the first stage a clustering ensemble method based on relabeling and voting process is used to cluster the data objects. Therefore, a new set of disjoint sub-clusters is generated based on majority vote, where each data object vote for the cluster in which it belongs and for its corresponding cluster in each other clustering results. data objects without majority vote are collected in new dataset. In the second stage, the new database as well as the set of previously obtained sub-clusters are processed using an incremental clustering algorithm. The used incremental clustering algorithm is initialized using the obtained sub-clusters and operate on the new dataset elements. The main advantage of incremental clustering methods is that the system can updates its assumptions based on recently available learning data without re-examining old data. The proposed approach have been evaluated using different datasets, where the experimental results have demonstrated the effectiveness and robustness of the proposed method.