An unsupervised transfer learning approach to discover topics for online reputation management

Proceedings of the 22nd ACM international conference on Information & Knowledge Management Pub Date : 2013-10-27 DOI:10.1145/2505515.2507845

Tamara Martín-Wanton, Julio Gonzalo, Enrique Amigó

{"title":"An unsupervised transfer learning approach to discover topics for online reputation management","authors":"Tamara Martín-Wanton, Julio Gonzalo, Enrique Amigó","doi":"10.1145/2505515.2507845","DOIUrl":null,"url":null,"abstract":"Microblogs play an important role for Online Reputation Management. Companies and organizations in general have an increasing interest in obtaining the last minute information about which are the emerging topics that concern their reputation. In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. Our approach relies on transfer learning by contextualizing a target collection of tweets with a large set of unlabeled \"background\" tweets that help improving the clustering of the target collection. We include background tweets together with target tweets in a TwitterLDA process, and we set the total number of clusters. In practice, this means that the system can adapt to find the right number of clusters for the target data, overcoming one of the limitations of using LDA-based approaches (the need of establishing a priori the number of clusters). Our experiments using RepLab 2012 data show that using the background collection gives a 20% improvement over a direct application of TwitterLDA using only the target collection. Our data also confirms that the approach can effectively predict the right number of target clusters in a way that is robust with respect to the total number of clusters established a priori.","PeriodicalId":20528,"journal":{"name":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","volume":"21 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-10-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd ACM international conference on Information & Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2505515.2507845","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

Abstract

Microblogs play an important role for Online Reputation Management. Companies and organizations in general have an increasing interest in obtaining the last minute information about which are the emerging topics that concern their reputation. In this paper, we present a new technique to cluster a collection of tweets emitted within a short time span about a specific entity. Our approach relies on transfer learning by contextualizing a target collection of tweets with a large set of unlabeled "background" tweets that help improving the clustering of the target collection. We include background tweets together with target tweets in a TwitterLDA process, and we set the total number of clusters. In practice, this means that the system can adapt to find the right number of clusters for the target data, overcoming one of the limitations of using LDA-based approaches (the need of establishing a priori the number of clusters). Our experiments using RepLab 2012 data show that using the background collection gives a 20% improvement over a direct application of TwitterLDA using only the target collection. Our data also confirms that the approach can effectively predict the right number of target clusters in a way that is robust with respect to the total number of clusters established a priori.

查看原文本刊更多论文

一种无监督迁移学习方法来发现在线声誉管理的主题

微博在网络声誉管理中发挥着重要作用。一般来说，公司和组织越来越有兴趣在最后一刻获得有关其声誉的新兴话题的信息。在本文中，我们提出了一种新技术来聚类在短时间内发出的关于特定实体的推文集合。我们的方法依赖于迁移学习，通过将推文的目标集合与大量未标记的“背景”推文进行上下文化，这有助于提高目标集合的聚类。我们在TwitterLDA进程中包括背景推文和目标推文，并设置集群的总数。在实践中，这意味着系统可以适应为目标数据找到正确数量的集群，克服了使用基于lda的方法的限制之一(需要先验地建立集群的数量)。我们使用RepLab 2012数据进行的实验表明，与只使用目标集合的TwitterLDA直接应用程序相比，使用后台集合的性能提高了20%。我们的数据还证实，该方法可以有效地预测目标簇的正确数量，并且相对于先验建立的簇总数具有鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 22nd ACM international conference on Information & Knowledge Management

自引率

0.00%

发文量