{"title":"Topic Detection from Microblog Based on Text Clustering and Topic Model Analysis","authors":"Siqi Huang, Yitao Yang, Huakang Li, Guozi Sun","doi":"10.1109/APSCC.2014.18","DOIUrl":null,"url":null,"abstract":"This paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in handling sparse Micro blog short texts. In consequence of the structural data of the Microblog, which exists rich inter-textual contextual information such as retweets, comments, user hash tag, embedded link URL, we first put forward a feature weight pre-processing method. We also use a clustering algorithm based on word vectors to enrich the feature information of the data. On this basis, we extend the conventional LDA (Latent Dirichlet allocation) topic model to extract the hot topics in the Micro blog data. Compared with the traditional methods, the method raised in this paper is much more effective in the collected text corpus in Sina Microblog.","PeriodicalId":393593,"journal":{"name":"2014 Asia-Pacific Services Computing Conference","volume":"183 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Asia-Pacific Services Computing Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSCC.2014.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13
Abstract
This paper raises a Microblog topic detection method based on text clustering and topic model analysis. It solves the problem that the traditional topic detection method is mainly applicable for traditional media text, which is not very effective in handling sparse Micro blog short texts. In consequence of the structural data of the Microblog, which exists rich inter-textual contextual information such as retweets, comments, user hash tag, embedded link URL, we first put forward a feature weight pre-processing method. We also use a clustering algorithm based on word vectors to enrich the feature information of the data. On this basis, we extend the conventional LDA (Latent Dirichlet allocation) topic model to extract the hot topics in the Micro blog data. Compared with the traditional methods, the method raised in this paper is much more effective in the collected text corpus in Sina Microblog.