交互式搜索Top-k中的一个

Proceedings of the 2021 International Conference on Management of Data Pub Date : 2021-06-09 DOI:10.1145/3448016.3457322

Weicheng Wang, R. C. Wong, Min Xie

{"title":"交互式搜索Top-k中的一个","authors":"Weicheng Wang, R. C. Wong, Min Xie","doi":"10.1145/3448016.3457322","DOIUrl":null,"url":null,"abstract":"When a large dataset is given, it is not desirable for a user to read all tuples one-by-one in the whole dataset to find satisfied tuples. The traditional top-k query finds the best k tuples (i.e., the top-k tuples) w.r.t. the user's preference. However, in practice, it is difficult for a user to specify his/her preference explicitly. We study how to enhance the top-k query with user interaction. Specifically, we ask a user several questions, each of which consists of two tuples and asks the user to indicate which one s/he prefers. Based on the feedback, the user's preference is learned implicitly and one of the top-k tuples w.r.t. the learned preference is returned. Here, instead of directly following the top-k query to return all the top-k tuples, since it requires heavy user effort during the interaction (e.g., answering many questions), we reduce the output size to strike for a trade-off between the user effort and the output size. To achieve this, we present an algorithm 2D-PI which asks an asymptotically optimal number of questions in a 2-dimensional space, and two algorithms HD-PI and RH with provable performance guarantee in a d-dimensional space (d >= 2), where they focus on the number of questions asked and the execution time, respectively. Experiments were conducted on synthetic and real datasets, showing that our algorithms outperform existing ones by asking fewer questions within less time to return satisfied tuples.","PeriodicalId":360379,"journal":{"name":"Proceedings of the 2021 International Conference on Management of Data","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"Interactive Search for One of the Top-k\",\"authors\":\"Weicheng Wang, R. C. Wong, Min Xie\",\"doi\":\"10.1145/3448016.3457322\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When a large dataset is given, it is not desirable for a user to read all tuples one-by-one in the whole dataset to find satisfied tuples. The traditional top-k query finds the best k tuples (i.e., the top-k tuples) w.r.t. the user's preference. However, in practice, it is difficult for a user to specify his/her preference explicitly. We study how to enhance the top-k query with user interaction. Specifically, we ask a user several questions, each of which consists of two tuples and asks the user to indicate which one s/he prefers. Based on the feedback, the user's preference is learned implicitly and one of the top-k tuples w.r.t. the learned preference is returned. Here, instead of directly following the top-k query to return all the top-k tuples, since it requires heavy user effort during the interaction (e.g., answering many questions), we reduce the output size to strike for a trade-off between the user effort and the output size. To achieve this, we present an algorithm 2D-PI which asks an asymptotically optimal number of questions in a 2-dimensional space, and two algorithms HD-PI and RH with provable performance guarantee in a d-dimensional space (d >= 2), where they focus on the number of questions asked and the execution time, respectively. Experiments were conducted on synthetic and real datasets, showing that our algorithms outperform existing ones by asking fewer questions within less time to return satisfied tuples.\",\"PeriodicalId\":360379,\"journal\":{\"name\":\"Proceedings of the 2021 International Conference on Management of Data\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2021 International Conference on Management of Data\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3448016.3457322\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2021 International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3448016.3457322","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

当给定一个大数据集时，用户不希望在整个数据集中逐一读取所有元组以找到满意的元组。传统的top-k查询根据用户的偏好查找最佳k个元组(即top-k元组)。然而，在实践中，用户很难明确地指定他/她的偏好。我们研究了如何通过用户交互来增强top-k查询。具体来说，我们向用户提出几个问题，每个问题由两个元组组成，并要求用户指出他/她更喜欢哪一个。根据反馈，隐式地学习用户的偏好，并返回学习到的偏好的top-k元组之一。这里，我们没有直接跟随top-k查询返回所有top-k元组，因为它在交互期间需要大量的用户工作(例如，回答许多问题)，而是减少了输出大小，以便在用户工作和输出大小之间进行权衡。为了实现这一目标，我们提出了在二维空间中提出渐近最优问题数的算法2D-PI，以及在d维空间(d >= 2)中具有可证明的性能保证的两种算法HD-PI和RH，其中它们分别关注问题的数量和执行时间。在合成数据集和真实数据集上进行的实验表明，我们的算法通过在更短的时间内提出更少的问题来返回满意的元组，从而优于现有的算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Interactive Search for One of the Top-k

When a large dataset is given, it is not desirable for a user to read all tuples one-by-one in the whole dataset to find satisfied tuples. The traditional top-k query finds the best k tuples (i.e., the top-k tuples) w.r.t. the user's preference. However, in practice, it is difficult for a user to specify his/her preference explicitly. We study how to enhance the top-k query with user interaction. Specifically, we ask a user several questions, each of which consists of two tuples and asks the user to indicate which one s/he prefers. Based on the feedback, the user's preference is learned implicitly and one of the top-k tuples w.r.t. the learned preference is returned. Here, instead of directly following the top-k query to return all the top-k tuples, since it requires heavy user effort during the interaction (e.g., answering many questions), we reduce the output size to strike for a trade-off between the user effort and the output size. To achieve this, we present an algorithm 2D-PI which asks an asymptotically optimal number of questions in a 2-dimensional space, and two algorithms HD-PI and RH with provable performance guarantee in a d-dimensional space (d >= 2), where they focus on the number of questions asked and the execution time, respectively. Experiments were conducted on synthetic and real datasets, showing that our algorithms outperform existing ones by asking fewer questions within less time to return satisfied tuples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2021 International Conference on Management of Data

自引率

0.00%

发文量