Unsupervised Construction of Topic-Based Twitter Lists

F. D. Villiers, M. Hoffmann, Steve Kroon
{"title":"Unsupervised Construction of Topic-Based Twitter Lists","authors":"F. D. Villiers, M. Hoffmann, Steve Kroon","doi":"10.1109/SocialCom-PASSAT.2012.64","DOIUrl":null,"url":null,"abstract":"The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.","PeriodicalId":129526,"journal":{"name":"2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing","volume":"44 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SocialCom-PASSAT.2012.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.
基于主题的Twitter列表的无监督构建
Twitter列表功能于2009年底推出,可以创建包含Twitter用户的精选群组。每个用户都可以是列表作者,并决定将其他用户添加到列表的基础。最受欢迎的是那些与某个主题相关的列表。Twitter列表可以作为一种强大的组织工具,但它的广泛采用受到限制。两个主要障碍是初始设置时间和持续管理的努力。在本文中,我们试图通过应用无监督聚类算法来构建基于主题的Twitter列表来解决第一个问题。我们考虑k-means和affinity propagation (AP)作为聚类算法,并使用两种文档表示技术评估这些算法。所选择的表示技术是流行的术语频率-逆文档频率(TF-IDF)和潜在狄利克雷分配(LDA)主题模型。我们使用在文本领域广泛使用的五种众所周知的相似度度量来计算聚类算法的相似度。使用调整后的归一化信息距离(ANID)来比较k-means和亲和传播产生的聚类结果。我们发现,仔细选择相似度度量,结合LDA主题模型,可以为用户提供一个合理的列表创建起点。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信