Towards Tractability of the Diversity of Query Answers: Ultrametrics to the Rescue

Marcelo Arenas, Timo Camillo Merkl, Reinhard Pichler, Cristian Riveros
{"title":"Towards Tractability of the Diversity of Query Answers: Ultrametrics to the Rescue","authors":"Marcelo Arenas, Timo Camillo Merkl, Reinhard Pichler, Cristian Riveros","doi":"arxiv-2408.01657","DOIUrl":null,"url":null,"abstract":"The set of answers to a query may be very large, potentially overwhelming\nusers when presented with the entire set. In such cases, presenting only a\nsmall subset of the answers to the user may be preferable. A natural\nrequirement for this subset is that it should be as diverse as possible to\nreflect the variety of the entire population. To achieve this, the diversity of\na subset is measured using a metric that determines how different two solutions\nare and a diversity function that extends this metric from pairs to sets. In\nthe past, several studies have shown that finding a diverse subset from an\nexplicitly given set is intractable even for simple metrics (like Hamming\ndistance) and simple diversity functions (like summing all pairwise distances).\nThis complexity barrier becomes even more challenging when trying to output a\ndiverse subset from a set that is only implicitly given such as the query\nanswers of a query and a database. Until now, tractable cases have been found\nonly for restricted problems and particular diversity functions. To overcome these limitations, we focus on the notion of ultrametrics, which\nhave been widely studied and used in many applications. Starting from any\nultrametric $d$ and a diversity function $\\delta$ extending $d$, we provide\nsufficient conditions over $\\delta$ for having polynomial-time algorithms to\nconstruct diverse answers. To the best of our knowledge, these conditions are\nsatisfied by all diversity functions considered in the literature. Moreover, we\ncomplement these results with lower bounds that show specific cases when these\nconditions are not satisfied and finding diverse subsets becomes intractable.\nWe conclude by applying these results to the evaluation of conjunctive queries,\ndemonstrating efficient algorithms for finding a diverse subset of solutions\nfor acyclic conjunctive queries when the attribute order is used to measure\ndiversity.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The set of answers to a query may be very large, potentially overwhelming users when presented with the entire set. In such cases, presenting only a small subset of the answers to the user may be preferable. A natural requirement for this subset is that it should be as diverse as possible to reflect the variety of the entire population. To achieve this, the diversity of a subset is measured using a metric that determines how different two solutions are and a diversity function that extends this metric from pairs to sets. In the past, several studies have shown that finding a diverse subset from an explicitly given set is intractable even for simple metrics (like Hamming distance) and simple diversity functions (like summing all pairwise distances). This complexity barrier becomes even more challenging when trying to output a diverse subset from a set that is only implicitly given such as the query answers of a query and a database. Until now, tractable cases have been found only for restricted problems and particular diversity functions. To overcome these limitations, we focus on the notion of ultrametrics, which have been widely studied and used in many applications. Starting from any ultrametric $d$ and a diversity function $\delta$ extending $d$, we provide sufficient conditions over $\delta$ for having polynomial-time algorithms to construct diverse answers. To the best of our knowledge, these conditions are satisfied by all diversity functions considered in the literature. Moreover, we complement these results with lower bounds that show specific cases when these conditions are not satisfied and finding diverse subsets becomes intractable. We conclude by applying these results to the evaluation of conjunctive queries, demonstrating efficient algorithms for finding a diverse subset of solutions for acyclic conjunctive queries when the attribute order is used to measure diversity.
实现查询答案多样性的可操作性:超计量学的拯救
查询的答案集可能非常庞大,如果向用户展示整个答案集,可能会让用户不知所措。在这种情况下,最好只向用户展示一小部分答案子集。对这个子集的一个自然要求是它应尽可能多样化,以反映整个人群的多样性。为了实现这一目标,可以使用一种指标来衡量子集的多样性,这种指标可以确定两个解决方案的不同程度,而多样性函数则可以将这一指标从对扩展到集。过去的一些研究表明,从一个明确给定的集合中找到一个多样性子集,即使对于简单的度量(如汉明距离)和简单的多样性函数(如求所有成对距离之和)来说也是难以实现的。当试图从一个仅隐式给定的集合(如查询和数据库的查询答案)中输出一个多样性子集时,这种复杂性障碍就变得更具挑战性。迄今为止,只有在有限问题和特定多样性函数中才发现了可处理的案例。为了克服这些局限性,我们将重点放在超度量的概念上,超度量已被广泛研究并应用于许多领域。从任意超度量 $d$ 和扩展 $d$ 的多样性函数 $\delta$ 开始,我们提供了在 $\delta$ 上构建多样性答案的多项式时间算法的充分条件。据我们所知,文献中考虑的所有多样性函数都能满足这些条件。最后,我们将这些结果应用于连接查询的评估,展示了当属性顺序被用来衡量多样性时,为非循环连接查询寻找解决方案多样性子集的高效算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信