{"title":"Topic Detection in Chinese Microblogs Using Hot Term Discovery and Adaptive Spectral Clustering","authors":"Chengxu Ye, Ping Yang, Shaopeng Liu","doi":"10.1109/3PGCIC.2014.44","DOIUrl":null,"url":null,"abstract":"Weibo is a popular Chinese microblogging service that counts with millions of users and allows them to share short text messages. As an information network, Weibo can tell people what they care about as it is happening in the society. Unfortunately, users are constantly struggling to keep up with the larger and larger amounts of messages published every day. In order to help users to get the big picture, an efficient and effective topic detection method is urgent in demand. Considering the sheer scale and rapid evolution of the microblog messages, we investigate a novel method for topic detection in Chinese Microblogs in a given time period. It is composed of two major steps. First, hot terms are extracted by a suffix array structure and a TF*SDF term weighting scheme. Second, based on the extracted hot terms, we calculate their co-occurrence information and then group the terms into clusters that represent topics using an adaptive spectral clustering. Extensive experimental results on real world data demonstrate that the proposed method is more effective and efficient for topic detection in Chinese microblogs than existing approaches.","PeriodicalId":395610,"journal":{"name":"2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3PGCIC.2014.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Weibo is a popular Chinese microblogging service that counts with millions of users and allows them to share short text messages. As an information network, Weibo can tell people what they care about as it is happening in the society. Unfortunately, users are constantly struggling to keep up with the larger and larger amounts of messages published every day. In order to help users to get the big picture, an efficient and effective topic detection method is urgent in demand. Considering the sheer scale and rapid evolution of the microblog messages, we investigate a novel method for topic detection in Chinese Microblogs in a given time period. It is composed of two major steps. First, hot terms are extracted by a suffix array structure and a TF*SDF term weighting scheme. Second, based on the extracted hot terms, we calculate their co-occurrence information and then group the terms into clusters that represent topics using an adaptive spectral clustering. Extensive experimental results on real world data demonstrate that the proposed method is more effective and efficient for topic detection in Chinese microblogs than existing approaches.