Efficient Graph Models for Retrieving Top-k News Feeds from Ego Networks

2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing Pub Date : 2012-09-03 DOI:10.1109/SocialCom-PASSAT.2012.73

Rene Pickhardt, Thomas Gottron, A. Scherp, Steffen Staab, J. Kunze

{"title":"Efficient Graph Models for Retrieving Top-k News Feeds from Ego Networks","authors":"Rene Pickhardt, Thomas Gottron, A. Scherp, Steffen Staab, J. Kunze","doi":"10.1109/SocialCom-PASSAT.2012.73","DOIUrl":null,"url":null,"abstract":"A key challenge of web platforms like social networking sites and services for news feed aggregation is the efficient and targeted distribution of new content items to users. This can be formulated as the problem of retrieving the top-k news items out of the d-degree ego network of each given user, where the set of all users producing feeds is of size n, with n ≫ d ≫ k and typically k <; 20. Existing approaches employ either expensive join operations on global indices or suffer from high redundancy through denormalization. This makes retrieval of different top-k news feeds for thousands of users per second very inefficient in a large social network. In this paper, we propose two graph models GRAPHITY and STOU to address this problem. GRAPHITY is optimized for fast retrieval of news feeds and has a runtime of O(k log(k)). The GRAPHITY index does not involve data redundancy. An update of the index upon insertion of a new item to the feed is possible in a runtime linear to the nodes' indegree din. New content can be stored in STOU in O(1) at the cost of slower retrieval speed of O(d log(d)). We verify the theoretical runtime complexity of GRAPHITY and STOU on two data sets of different characteristics and size. We show that on a single machine GRAPHITY is able to retrieve more than 10 000 unique news feeds per second in a network with more than one million users. Our evaluation confirms that retrieval of news feeds with GRAPHITY is independent of the node degree d of a user's ego network and network size n and does scale to networks of arbitrary size.","PeriodicalId":129526,"journal":{"name":"2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SocialCom-PASSAT.2012.73","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

A key challenge of web platforms like social networking sites and services for news feed aggregation is the efficient and targeted distribution of new content items to users. This can be formulated as the problem of retrieving the top-k news items out of the d-degree ego network of each given user, where the set of all users producing feeds is of size n, with n ≫ d ≫ k and typically k <; 20. Existing approaches employ either expensive join operations on global indices or suffer from high redundancy through denormalization. This makes retrieval of different top-k news feeds for thousands of users per second very inefficient in a large social network. In this paper, we propose two graph models GRAPHITY and STOU to address this problem. GRAPHITY is optimized for fast retrieval of news feeds and has a runtime of O(k log(k)). The GRAPHITY index does not involve data redundancy. An update of the index upon insertion of a new item to the feed is possible in a runtime linear to the nodes' indegree din. New content can be stored in STOU in O(1) at the cost of slower retrieval speed of O(d log(d)). We verify the theoretical runtime complexity of GRAPHITY and STOU on two data sets of different characteristics and size. We show that on a single machine GRAPHITY is able to retrieve more than 10 000 unique news feeds per second in a network with more than one million users. Our evaluation confirms that retrieval of news feeds with GRAPHITY is independent of the node degree d of a user's ego network and network size n and does scale to networks of arbitrary size.

查看原文本刊更多论文

从Ego Networks中检索Top-k新闻源的高效图模型

社交网站和新闻聚合服务等网络平台面临的一个关键挑战是，如何高效、有针对性地向用户分发新内容。这可以表述为从每个给定用户的d度自我网络中检索top-k个新闻条目的问题，其中生成提要的所有用户的集合大小为n，其中n比d比k，通常k <;20.现有的方法要么在全局索引上使用昂贵的连接操作，要么通过反规范化导致高冗余。这使得在大型社交网络中，每秒为数千个用户检索不同的top-k新闻提要的效率非常低。在本文中，我们提出了GRAPHITY和STOU两个图模型来解决这个问题。GRAPHITY针对新闻提要的快速检索进行了优化，运行时间为O(k log(k))。GRAPHITY索引不涉及数据冗余。在向提要插入新项时，可以在与节点的indededin线性的运行时中更新索引。新的内容可以在O(1)中存储到STOU中，代价是检索速度较慢，为O(d log(d))。我们在两个不同特征和大小的数据集上验证了GRAPHITY和STOU的理论运行复杂度。我们表明，在一台机器上，GRAPHITY能够在拥有超过一百万用户的网络中每秒检索超过10,000个唯一的新闻提要。我们的评估证实，使用GRAPHITY检索新闻提要与用户自我网络的节点度d和网络大小n无关，并且可以扩展到任意大小的网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Confernece on Social Computing

自引率

0.00%

发文量