LASTA: large scale topic assignment on multiple social networks

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining Pub Date : 2014-08-24 DOI:10.1145/2623330.2623350

Nemanja Spasojevic, Jinyun Yan, Adithya Rao, Prantik Bhattacharyya

{"title":"LASTA: large scale topic assignment on multiple social networks","authors":"Nemanja Spasojevic, Jinyun Yan, Adithya Rao, Prantik Bhattacharyya","doi":"10.1145/2623330.2623350","DOIUrl":null,"url":null,"abstract":"Millions of people use social networks everyday to talk about a variety of subjects, publish opinions and share information. Understanding this data to infer user's topical interests is a challenging problem with applications in various data-powered products. In this paper, we present 'LASTA' (Large Scale Topic Assignment), a full production system used at Klout, Inc., which mines topical interests from five social networks and assigns over 10,000 topics to hundreds of millions of users on a daily basis. The system continuously collects streams of user data and is reactive to fresh information, updating topics for users as interests shift. LASTA generates over 50 distinct features derived from signals such as user generated posts and profiles, user reactions such as comments and retweets, user attributions such as lists, tags and endorsements, as well as signals based on social graph connections. We show that using this diverse set of features leads to a better representation of a user's topical interests as compared to using only generated text or only graph based features. We also show that using cross-network information for a user leads to a more complete and accurate understanding of the user's topics, as compared to using any single network. We evaluate LASTA's topic assignment system on an internal labeled corpus of 32,264 user-topic labels generated from real users.","PeriodicalId":20536,"journal":{"name":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2623330.2623350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

Millions of people use social networks everyday to talk about a variety of subjects, publish opinions and share information. Understanding this data to infer user's topical interests is a challenging problem with applications in various data-powered products. In this paper, we present 'LASTA' (Large Scale Topic Assignment), a full production system used at Klout, Inc., which mines topical interests from five social networks and assigns over 10,000 topics to hundreds of millions of users on a daily basis. The system continuously collects streams of user data and is reactive to fresh information, updating topics for users as interests shift. LASTA generates over 50 distinct features derived from signals such as user generated posts and profiles, user reactions such as comments and retweets, user attributions such as lists, tags and endorsements, as well as signals based on social graph connections. We show that using this diverse set of features leads to a better representation of a user's topical interests as compared to using only generated text or only graph based features. We also show that using cross-network information for a user leads to a more complete and accurate understanding of the user's topics, as compared to using any single network. We evaluate LASTA's topic assignment system on an internal labeled corpus of 32,264 user-topic labels generated from real users.

查看原文本刊更多论文

LASTA:在多个社交网络上进行大规模的主题分配

数以百万计的人每天使用社交网络谈论各种主题，发表意见和分享信息。在各种数据驱动的产品中，理解这些数据以推断用户的主题兴趣是一个具有挑战性的问题。在本文中，我们介绍了“LASTA”(大规模主题分配)，这是Klout公司使用的一个完整的生产系统，它从五个社交网络中挖掘主题兴趣，并每天向数亿用户分配超过10,000个主题。该系统不断收集用户数据流，并对新信息做出反应，随着用户兴趣的变化，为用户更新主题。LASTA生成了50多个不同的功能，这些功能来自于用户生成的帖子和个人资料、用户反应(如评论和转发)、用户归属(如列表、标签和认可)以及基于社交图连接的信号。我们表明，与仅使用生成的文本或仅基于图形的特征相比，使用这种多样化的特征集可以更好地表示用户的主题兴趣。我们还表明，与使用任何单一网络相比，使用用户的跨网络信息可以更完整、更准确地理解用户的主题。我们在一个由真实用户生成的32,264个用户主题标签的内部标记语料库上评估LASTA的主题分配系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining

自引率

0.00%

发文量