Unsupervised Construction of Topic-Based Twitter Lists

2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing Pub Date : 2012-09-03 DOI:10.1109/SocialCom-PASSAT.2012.64

F. D. Villiers, M. Hoffmann, Steve Kroon

{"title":"Unsupervised Construction of Topic-Based Twitter Lists","authors":"F. D. Villiers, M. Hoffmann, Steve Kroon","doi":"10.1109/SocialCom-PASSAT.2012.64","DOIUrl":null,"url":null,"abstract":"The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.","PeriodicalId":129526,"journal":{"name":"2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing","volume":"44 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SocialCom-PASSAT.2012.64","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

The Twitter lists feature was launched in late 2009 and enables the creation of curated groups containing Twitter users. Each user can be a list author and decide the basis on which other users are added to a list. The most popular lists are those that associate with a topic. Twitter lists can be used as a powerful organisation tool, but its widespread adoption has been limited. The two main obstacles are the initial setup time and the effort of continual curation. In this paper we attempt to solve the first problem by applying unsupervised clustering algorithms to construct topic-based Twitter lists. We consider k-means and affinity propagation (AP) as clustering algorithms and evaluate these algorithms using two document representation techniques. The selected representation techniques are the popular term frequency-inverse document frequency (TF-IDF) and the latent Dirichlet allocation (LDA) topic model. We calculate the similarities for the clustering algorithms using five well-known similarity measures that have been used extensively in the text domain. The adjusted normalised information distance (ANID) was used to compare the clustering result yielded by k-means and affinity propagation. We found that the careful selection of a similarity measure, combined with the LDA topic model can provide a user with a sensible starting point for list creation.

查看原文本刊更多论文

基于主题的Twitter列表的无监督构建

Twitter列表功能于2009年底推出，可以创建包含Twitter用户的精选群组。每个用户都可以是列表作者，并决定将其他用户添加到列表的基础。最受欢迎的是那些与某个主题相关的列表。Twitter列表可以作为一种强大的组织工具，但它的广泛采用受到限制。两个主要障碍是初始设置时间和持续管理的努力。在本文中，我们试图通过应用无监督聚类算法来构建基于主题的Twitter列表来解决第一个问题。我们考虑k-means和affinity propagation (AP)作为聚类算法，并使用两种文档表示技术评估这些算法。所选择的表示技术是流行的术语频率-逆文档频率(TF-IDF)和潜在狄利克雷分配(LDA)主题模型。我们使用在文本领域广泛使用的五种众所周知的相似度度量来计算聚类算法的相似度。使用调整后的归一化信息距离(ANID)来比较k-means和亲和传播产生的聚类结果。我们发现，仔细选择相似度度量，结合LDA主题模型，可以为用户提供一个合理的列表创建起点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing

自引率

0.00%

发文量