M. Rafi, Fizza Abid, Hamza Mustafa Khan, Anum Mirza
{"title":"Towards the Constraint Learning and Optimization Approach to Document Clustering","authors":"M. Rafi, Fizza Abid, Hamza Mustafa Khan, Anum Mirza","doi":"10.1109/IMTIC53841.2021.9719790","DOIUrl":null,"url":null,"abstract":"This research proposed autonomous constraint learning from a document collection to incorporate these constraints into an effective document clustering process. Constraint Clustering is based on semi-supervised approach towards document clustering, where some prior knowledge about the collection is readily available for clustering. The paper proposes algorithms based on sampling to find three different kind of constraints from the document collection (i) instance level (ii) cluster level and (iii) corpus level. The constraints integrated into constraint K-Mean produced multiple clusters satisfying the constraints. A boosting method is suggested to adaptively learn the constraint's priorities, constraint satisfaction criteria and optimal clustering solution from multiple possible solutions. The proposed algorithms are implemented and tested over standard text mining dataset. The evaluation measures for testing the algorithm involves purity, entropy, and F-measure. This experimental studies achieved encouraging results for constraint learning and on average, 6% improvements are achieved on clustering results.","PeriodicalId":172583,"journal":{"name":"2021 6th International Multi-Topic ICT Conference (IMTIC)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 6th International Multi-Topic ICT Conference (IMTIC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IMTIC53841.2021.9719790","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This research proposed autonomous constraint learning from a document collection to incorporate these constraints into an effective document clustering process. Constraint Clustering is based on semi-supervised approach towards document clustering, where some prior knowledge about the collection is readily available for clustering. The paper proposes algorithms based on sampling to find three different kind of constraints from the document collection (i) instance level (ii) cluster level and (iii) corpus level. The constraints integrated into constraint K-Mean produced multiple clusters satisfying the constraints. A boosting method is suggested to adaptively learn the constraint's priorities, constraint satisfaction criteria and optimal clustering solution from multiple possible solutions. The proposed algorithms are implemented and tested over standard text mining dataset. The evaluation measures for testing the algorithm involves purity, entropy, and F-measure. This experimental studies achieved encouraging results for constraint learning and on average, 6% improvements are achieved on clustering results.