Marcelo Arenas, Timo Camillo Merkl, Reinhard Pichler, Cristian Riveros
{"title":"实现查询答案多样性的可操作性:超计量学的拯救","authors":"Marcelo Arenas, Timo Camillo Merkl, Reinhard Pichler, Cristian Riveros","doi":"arxiv-2408.01657","DOIUrl":null,"url":null,"abstract":"The set of answers to a query may be very large, potentially overwhelming\nusers when presented with the entire set. In such cases, presenting only a\nsmall subset of the answers to the user may be preferable. A natural\nrequirement for this subset is that it should be as diverse as possible to\nreflect the variety of the entire population. To achieve this, the diversity of\na subset is measured using a metric that determines how different two solutions\nare and a diversity function that extends this metric from pairs to sets. In\nthe past, several studies have shown that finding a diverse subset from an\nexplicitly given set is intractable even for simple metrics (like Hamming\ndistance) and simple diversity functions (like summing all pairwise distances).\nThis complexity barrier becomes even more challenging when trying to output a\ndiverse subset from a set that is only implicitly given such as the query\nanswers of a query and a database. Until now, tractable cases have been found\nonly for restricted problems and particular diversity functions. To overcome these limitations, we focus on the notion of ultrametrics, which\nhave been widely studied and used in many applications. Starting from any\nultrametric $d$ and a diversity function $\\delta$ extending $d$, we provide\nsufficient conditions over $\\delta$ for having polynomial-time algorithms to\nconstruct diverse answers. To the best of our knowledge, these conditions are\nsatisfied by all diversity functions considered in the literature. Moreover, we\ncomplement these results with lower bounds that show specific cases when these\nconditions are not satisfied and finding diverse subsets becomes intractable.\nWe conclude by applying these results to the evaluation of conjunctive queries,\ndemonstrating efficient algorithms for finding a diverse subset of solutions\nfor acyclic conjunctive queries when the attribute order is used to measure\ndiversity.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-08-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Tractability of the Diversity of Query Answers: Ultrametrics to the Rescue\",\"authors\":\"Marcelo Arenas, Timo Camillo Merkl, Reinhard Pichler, Cristian Riveros\",\"doi\":\"arxiv-2408.01657\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The set of answers to a query may be very large, potentially overwhelming\\nusers when presented with the entire set. In such cases, presenting only a\\nsmall subset of the answers to the user may be preferable. A natural\\nrequirement for this subset is that it should be as diverse as possible to\\nreflect the variety of the entire population. To achieve this, the diversity of\\na subset is measured using a metric that determines how different two solutions\\nare and a diversity function that extends this metric from pairs to sets. In\\nthe past, several studies have shown that finding a diverse subset from an\\nexplicitly given set is intractable even for simple metrics (like Hamming\\ndistance) and simple diversity functions (like summing all pairwise distances).\\nThis complexity barrier becomes even more challenging when trying to output a\\ndiverse subset from a set that is only implicitly given such as the query\\nanswers of a query and a database. Until now, tractable cases have been found\\nonly for restricted problems and particular diversity functions. To overcome these limitations, we focus on the notion of ultrametrics, which\\nhave been widely studied and used in many applications. Starting from any\\nultrametric $d$ and a diversity function $\\\\delta$ extending $d$, we provide\\nsufficient conditions over $\\\\delta$ for having polynomial-time algorithms to\\nconstruct diverse answers. To the best of our knowledge, these conditions are\\nsatisfied by all diversity functions considered in the literature. Moreover, we\\ncomplement these results with lower bounds that show specific cases when these\\nconditions are not satisfied and finding diverse subsets becomes intractable.\\nWe conclude by applying these results to the evaluation of conjunctive queries,\\ndemonstrating efficient algorithms for finding a diverse subset of solutions\\nfor acyclic conjunctive queries when the attribute order is used to measure\\ndiversity.\",\"PeriodicalId\":501123,\"journal\":{\"name\":\"arXiv - CS - Databases\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Databases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.01657\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.01657","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Tractability of the Diversity of Query Answers: Ultrametrics to the Rescue
The set of answers to a query may be very large, potentially overwhelming
users when presented with the entire set. In such cases, presenting only a
small subset of the answers to the user may be preferable. A natural
requirement for this subset is that it should be as diverse as possible to
reflect the variety of the entire population. To achieve this, the diversity of
a subset is measured using a metric that determines how different two solutions
are and a diversity function that extends this metric from pairs to sets. In
the past, several studies have shown that finding a diverse subset from an
explicitly given set is intractable even for simple metrics (like Hamming
distance) and simple diversity functions (like summing all pairwise distances).
This complexity barrier becomes even more challenging when trying to output a
diverse subset from a set that is only implicitly given such as the query
answers of a query and a database. Until now, tractable cases have been found
only for restricted problems and particular diversity functions. To overcome these limitations, we focus on the notion of ultrametrics, which
have been widely studied and used in many applications. Starting from any
ultrametric $d$ and a diversity function $\delta$ extending $d$, we provide
sufficient conditions over $\delta$ for having polynomial-time algorithms to
construct diverse answers. To the best of our knowledge, these conditions are
satisfied by all diversity functions considered in the literature. Moreover, we
complement these results with lower bounds that show specific cases when these
conditions are not satisfied and finding diverse subsets becomes intractable.
We conclude by applying these results to the evaluation of conjunctive queries,
demonstrating efficient algorithms for finding a diverse subset of solutions
for acyclic conjunctive queries when the attribute order is used to measure
diversity.