Muhammad Rafi, Bilal Aamer, Mubashir Naseem, M. Osama
{"title":"Solving document clustering problem through meta heuristic algorithm: black hole","authors":"Muhammad Rafi, Bilal Aamer, Mubashir Naseem, M. Osama","doi":"10.1145/3184066.3184085","DOIUrl":null,"url":null,"abstract":"The paper proposed a soft computing approach to solve document clustering problem. Document clustering is a specialized clustering problem in which textual documents autonomously segregated to a number of identifiable, subject homogenous and smaller sub-collections (also called clusters). Identifying implicit textual patterns within the documents is a challenging aspect as there can be thousands of such textual features. Partition clustering algorithm like k-means is mainly used for this problem. There are several drawbacks in k-means algorithm such as (i) initial seeds dependency, and (ii) it traps into local optimal solution. Although every k-means solution may contain some good partial arrangements for clustering. Meta-heuristic algorithm like Black Hole (BH) uses certain trade-off of randomization and local search for finding the optimal and near optimal solution. Our motivation comes from the fact that meta-heuristic optimization can quickly produce a global optimal solution using random k-means initial solution. The contributions from this research are (i) an implementation of black hole algorithm using k-mean as embedding (ii) The phenomena of global search and local search optimization are used as parameters adjustments. A series of experiments are performed with our proposed method on standard text mining datasetslike: (i) NEWS20, (ii) Reuters and (iii) WebKB and results are evaluated on Purity and Silhouette Index. In comparison the proposed method outperforms the basic k-means, GA with k-means embedding and quickly converges to global or near global optimal solution.","PeriodicalId":109559,"journal":{"name":"International Conference on Machine Learning and Soft Computing","volume":"261 ","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Machine Learning and Soft Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3184066.3184085","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
The paper proposed a soft computing approach to solve document clustering problem. Document clustering is a specialized clustering problem in which textual documents autonomously segregated to a number of identifiable, subject homogenous and smaller sub-collections (also called clusters). Identifying implicit textual patterns within the documents is a challenging aspect as there can be thousands of such textual features. Partition clustering algorithm like k-means is mainly used for this problem. There are several drawbacks in k-means algorithm such as (i) initial seeds dependency, and (ii) it traps into local optimal solution. Although every k-means solution may contain some good partial arrangements for clustering. Meta-heuristic algorithm like Black Hole (BH) uses certain trade-off of randomization and local search for finding the optimal and near optimal solution. Our motivation comes from the fact that meta-heuristic optimization can quickly produce a global optimal solution using random k-means initial solution. The contributions from this research are (i) an implementation of black hole algorithm using k-mean as embedding (ii) The phenomena of global search and local search optimization are used as parameters adjustments. A series of experiments are performed with our proposed method on standard text mining datasetslike: (i) NEWS20, (ii) Reuters and (iii) WebKB and results are evaluated on Purity and Silhouette Index. In comparison the proposed method outperforms the basic k-means, GA with k-means embedding and quickly converges to global or near global optimal solution.