{"title":"A novel clustering technique for short texts","authors":"Neetu Singh, Narendra S. Chaudhari","doi":"10.1109/ICRITO.2016.7784956","DOIUrl":null,"url":null,"abstract":"We describe a novel clustering technique for clustering short texts, such as URLs, without enriching it with the help of external knowledge sources. Our technique first performs feature clustering to identify the key features of the dataset and then reconstructs the dataset on the basis of the key features. Then, it computes the similarity of the short texts belonging to the reconstructed dataset using similarity measures such as Jaccard, Cosine and Dice measures. Finally, it performs short text clustering using Spectral Clustering. We compare our method with conventional Spectral Clustering method which runs directly on the original short text dataset. We performed experiments on a subset of ODP dataset as well as WebKB dataset. The empirical results demonstrate an improvement of 21% in terms of accuracy over the Spectral Clustering method for ODP dataset and 29.2% for the WebKB dataset.","PeriodicalId":377611,"journal":{"name":"2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICRITO.2016.7784956","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
We describe a novel clustering technique for clustering short texts, such as URLs, without enriching it with the help of external knowledge sources. Our technique first performs feature clustering to identify the key features of the dataset and then reconstructs the dataset on the basis of the key features. Then, it computes the similarity of the short texts belonging to the reconstructed dataset using similarity measures such as Jaccard, Cosine and Dice measures. Finally, it performs short text clustering using Spectral Clustering. We compare our method with conventional Spectral Clustering method which runs directly on the original short text dataset. We performed experiments on a subset of ODP dataset as well as WebKB dataset. The empirical results demonstrate an improvement of 21% in terms of accuracy over the Spectral Clustering method for ODP dataset and 29.2% for the WebKB dataset.