Topic Detection in Chinese Microblogs Using Hot Term Discovery and Adaptive Spectral Clustering

Chengxu Ye, Ping Yang, Shaopeng Liu
{"title":"Topic Detection in Chinese Microblogs Using Hot Term Discovery and Adaptive Spectral Clustering","authors":"Chengxu Ye, Ping Yang, Shaopeng Liu","doi":"10.1109/3PGCIC.2014.44","DOIUrl":null,"url":null,"abstract":"Weibo is a popular Chinese microblogging service that counts with millions of users and allows them to share short text messages. As an information network, Weibo can tell people what they care about as it is happening in the society. Unfortunately, users are constantly struggling to keep up with the larger and larger amounts of messages published every day. In order to help users to get the big picture, an efficient and effective topic detection method is urgent in demand. Considering the sheer scale and rapid evolution of the microblog messages, we investigate a novel method for topic detection in Chinese Microblogs in a given time period. It is composed of two major steps. First, hot terms are extracted by a suffix array structure and a TF*SDF term weighting scheme. Second, based on the extracted hot terms, we calculate their co-occurrence information and then group the terms into clusters that represent topics using an adaptive spectral clustering. Extensive experimental results on real world data demonstrate that the proposed method is more effective and efficient for topic detection in Chinese microblogs than existing approaches.","PeriodicalId":395610,"journal":{"name":"2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 Ninth International Conference on P2P, Parallel, Grid, Cloud and Internet Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/3PGCIC.2014.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Weibo is a popular Chinese microblogging service that counts with millions of users and allows them to share short text messages. As an information network, Weibo can tell people what they care about as it is happening in the society. Unfortunately, users are constantly struggling to keep up with the larger and larger amounts of messages published every day. In order to help users to get the big picture, an efficient and effective topic detection method is urgent in demand. Considering the sheer scale and rapid evolution of the microblog messages, we investigate a novel method for topic detection in Chinese Microblogs in a given time period. It is composed of two major steps. First, hot terms are extracted by a suffix array structure and a TF*SDF term weighting scheme. Second, based on the extracted hot terms, we calculate their co-occurrence information and then group the terms into clusters that represent topics using an adaptive spectral clustering. Extensive experimental results on real world data demonstrate that the proposed method is more effective and efficient for topic detection in Chinese microblogs than existing approaches.
基于热词发现和自适应谱聚类的中文微博话题检测
微博是中国颇受欢迎的微博服务,拥有数百万用户,并允许他们分享短信。作为一个信息网络,微博可以告诉人们他们关心什么,因为它发生在社会上。不幸的是,用户总是在努力跟上每天发布的越来越多的消息。为了帮助用户获得全貌,迫切需要一种高效、有效的话题检测方法。考虑到微博信息的庞大规模和快速演变,我们研究了一种特定时间段的中文微博话题检测方法。它由两个主要步骤组成。首先,采用后缀数组结构和TF*SDF术语加权方案提取热点术语;其次,基于提取的热点词,我们计算它们的共现信息,然后使用自适应谱聚类将这些词分组成代表主题的聚类。在真实数据上的大量实验结果表明,该方法对中文微博的话题检测比现有方法更有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信