Selecting Representative Objects Considering Coverage and Diversity

Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data Pub Date : 2015-05-31 DOI:10.1145/2786006.2786012

Shenlu Wang, M. A. Cheema, Ying Zhang, Xuemin Lin

{"title":"Selecting Representative Objects Considering Coverage and Diversity","authors":"Shenlu Wang, M. A. Cheema, Ying Zhang, Xuemin Lin","doi":"10.1145/2786006.2786012","DOIUrl":null,"url":null,"abstract":"We say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is ϵ-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.","PeriodicalId":443011,"journal":{"name":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","volume":"161 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2786006.2786012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

Abstract

We say that an object o attracts a user u if o is one of the top-k objects according to the preference function defined by u. Given a set of objects (e.g., restaurants) and a set of users, in this paper, we study the problem of computing a set of representative objects considering two criteria: coverage and diversity. Coverage of a set S of objects is the distinct number of users that are attracted by the objects in S. Although a set of objects with high coverage attracts a large number of users, it is possible that all of these users have quite similar preferences. Consequently, the set of objects may be attractive only for a specific class of users with similar preference functions which may disappoint other users having widely different preferences. The diversity criterion addresses this issue by selecting a set S of objects such that the set of attracted users for each object in S is as different as possible from the sets of users attracted by the other objects in S. The existing work on representative objects considers only one of the coverage and diversity criteria. We are the first to consider both of the criteria where the importance of each criterion can be controlled using a parameter. Our algorithm has two phases. In the first phase, we prune the objects that cannot be among the representative objects and compute the set of attracted users (also called reverse top-k) for each of the remaining objects. In the second phase, the reverse top-k of these objects are used to compute the representative objects maximizing coverage and diversity. Since this problem is NP-hard, the second phase employs a greedy algorithm. For the sake of time and space efficiency, we adopt MinHash and KMV Synopses to assist the set operations. We prove that the proposed greedy algorithm is ϵ-approximate. Our extensive experimental study on real and synthetic data sets demonstrates the effectiveness of our proposed techniques.

查看原文本刊更多论文

考虑覆盖和多样性选择代表性对象

根据u定义的偏好函数，我们说对象o吸引用户u，如果o是top-k对象中的一个。给定一组对象(例如餐馆)和一组用户，本文研究了考虑覆盖率和多样性两个标准计算一组代表性对象的问题。一组S对象的覆盖率是指被S中的对象所吸引的不同数量的用户。尽管一组具有高覆盖率的对象吸引了大量用户，但有可能所有这些用户都具有非常相似的偏好。因此，这组对象可能只对具有相似偏好函数的特定类别的用户具有吸引力，这可能会使其他具有广泛不同偏好的用户失望。多样性标准通过选择一组对象S来解决这个问题，使得S中每个对象吸引的用户集尽可能不同于S中其他对象吸引的用户集。现有的关于代表性对象的工作只考虑覆盖和多样性标准中的一个。我们是第一个考虑这两个标准的人，其中每个标准的重要性可以使用参数来控制。我们的算法有两个阶段。在第一阶段，我们修剪不属于代表性对象的对象，并计算每个剩余对象的吸引用户集(也称为反向top-k)。在第二阶段，使用这些对象的反向top-k来计算最大覆盖率和多样性的代表性对象。由于这个问题是np困难的，第二阶段采用贪婪算法。为了节省时间和空间，我们采用了MinHash和KMV synopse来辅助集合操作。我们证明了所提出的贪心算法是ϵ-approximate。我们对真实和合成数据集的广泛实验研究证明了我们提出的技术的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Second International ACM Workshop on Managing and Mining Enriched Geo-Spatial Data

自引率

0.00%

发文量