{"title":"Top-k representative queries with binary constraints","authors":"Arijit Khan, Vishwakarma Singh","doi":"10.1145/2791347.2791367","DOIUrl":null,"url":null,"abstract":"Given a collection of binary constraints that categorize whether a data object is relevant or not, we consider the problem of online retrieval of the top-k objects that best represent all other relevant objects in the underlying dataset. Such top-k representative queries naturally arise in a wide range of complex data analytic applications including advertisement, search, and recommendation. In this paper, we aim at identifying the top-k representative objects that are high-scoring, satisfy diverse subsets of given binary constraints, as well as representative of various other relevant objects in the dataset. We formulate our problem with the well-established notion of the top-k representative skylines, and we show that the problem is NP-hard. Hence, we design efficient techniques to solve our problem with theoretical performance guarantees. As a side-product of our algorithm, we also improve the asymptotic time-complexity of skyline computation to log-linear time in the number of data points when all dimensions except one are binary in nature. Our empirical results attest that the proposed method efficiently finds high-quality top-k representative objects, while our technique is one order of magnitude faster than state-of-the-art methods for finding the top-k skylines with binary constraints.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2791347.2791367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Given a collection of binary constraints that categorize whether a data object is relevant or not, we consider the problem of online retrieval of the top-k objects that best represent all other relevant objects in the underlying dataset. Such top-k representative queries naturally arise in a wide range of complex data analytic applications including advertisement, search, and recommendation. In this paper, we aim at identifying the top-k representative objects that are high-scoring, satisfy diverse subsets of given binary constraints, as well as representative of various other relevant objects in the dataset. We formulate our problem with the well-established notion of the top-k representative skylines, and we show that the problem is NP-hard. Hence, we design efficient techniques to solve our problem with theoretical performance guarantees. As a side-product of our algorithm, we also improve the asymptotic time-complexity of skyline computation to log-linear time in the number of data points when all dimensions except one are binary in nature. Our empirical results attest that the proposed method efficiently finds high-quality top-k representative objects, while our technique is one order of magnitude faster than state-of-the-art methods for finding the top-k skylines with binary constraints.