Representing documents through their readers

Khalid El-Arini, Min Xu, E. Fox, Carlos Guestrin
{"title":"Representing documents through their readers","authors":"Khalid El-Arini, Min Xu, E. Fox, Carlos Guestrin","doi":"10.1145/2487575.2487596","DOIUrl":null,"url":null,"abstract":"From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers? To address this question, we model the content of news articles and blog posts by attributes of the people who are likely to share them. For example, many Twitter users describe themselves in a short profile, labeling themselves with phrases such as \"vegetarian\" or \"liberal.\" By assuming that a user's labels correspond to topics in the articles he shares, we can learn a labeled dictionary from a training corpus of articles shared on Twitter. Thereafter, we can code any new document as a sparse non-negative linear combination of user labels, where we encourage correlated labels to appear together in the output via a structured sparsity penalty. Finally, we show that our approach yields a novel document representation that can be effectively used in many problem settings, from recommendation to modeling news dynamics. For example, while the top politics stories will change drastically from one month to the next, the \"politics\" label will still be there to describe them. We evaluate our model on millions of tweeted news articles and blog posts collected between September 2010 and September 2012, demonstrating that our approach is effective.","PeriodicalId":20472,"journal":{"name":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2487575.2487596","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17

Abstract

From Twitter to Facebook to Reddit, users have become accustomed to sharing the articles they read with friends or followers on their social networks. While previous work has modeled what these shared stories say about the user who shares them, the converse question remains unexplored: what can we learn about an article from the identities of its likely readers? To address this question, we model the content of news articles and blog posts by attributes of the people who are likely to share them. For example, many Twitter users describe themselves in a short profile, labeling themselves with phrases such as "vegetarian" or "liberal." By assuming that a user's labels correspond to topics in the articles he shares, we can learn a labeled dictionary from a training corpus of articles shared on Twitter. Thereafter, we can code any new document as a sparse non-negative linear combination of user labels, where we encourage correlated labels to appear together in the output via a structured sparsity penalty. Finally, we show that our approach yields a novel document representation that can be effectively used in many problem settings, from recommendation to modeling news dynamics. For example, while the top politics stories will change drastically from one month to the next, the "politics" label will still be there to describe them. We evaluate our model on millions of tweeted news articles and blog posts collected between September 2010 and September 2012, demonstrating that our approach is effective.
通过读者表示文档
从Twitter到Facebook再到Reddit,用户已经习惯于在社交网络上与朋友或关注者分享他们读到的文章。虽然之前的工作已经模拟了这些共享故事对分享它们的用户的影响,但相反的问题仍然没有被探索:我们可以从一篇文章的可能读者的身份中了解到什么?为了解决这个问题,我们根据可能分享它们的人的属性对新闻文章和博客文章的内容进行建模。例如,许多Twitter用户在简短的简介中描述自己,给自己贴上“素食主义者”或“自由主义者”等短语的标签。假设用户的标签与他分享的文章中的主题相对应,我们可以从Twitter上分享的文章训练语料库中学习一个有标签的字典。此后,我们可以将任何新文档编码为用户标签的稀疏非负线性组合,我们通过结构化稀疏性惩罚鼓励相关标签在输出中一起出现。最后,我们展示了我们的方法产生了一种新颖的文档表示,可以有效地用于许多问题设置,从推荐到新闻动态建模。例如,虽然每个月的热门政治新闻都会发生巨大变化,但“政治”这个标签仍然会用来描述它们。我们对2010年9月至2012年9月间收集的数百万篇推特新闻文章和博客文章进行了评估,证明我们的方法是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信