Vasileios Nakos, Hung Q. Ngo, Charalampos E. Tsourakakis
{"title":"Targeted Least Cardinality Candidate Key for Relational Databases","authors":"Vasileios Nakos, Hung Q. Ngo, Charalampos E. Tsourakakis","doi":"arxiv-2408.13540","DOIUrl":null,"url":null,"abstract":"Functional dependencies (FDs) are a central theme in databases, playing a\nmajor role in the design of database schemas and the optimization of queries.\nIn this work, we introduce the {\\it targeted least cardinality candidate key\nproblem} (TCAND). This problem is defined over a set of functional dependencies\n$F$ and a target variable set $T \\subseteq V$, and it aims to find the smallest\nset $X \\subseteq V$ such that the FD $X \\to T$ can be derived from $F$. The\nTCAND problem generalizes the well-known NP-hard problem of finding the least\ncardinality candidate key~\\cite{lucchesi1978candidate}, which has been\npreviously demonstrated to be at least as difficult as the set cover problem. We present an integer programming (IP) formulation for the TCAND problem,\nanalogous to a layered set cover problem. We analyze its linear programming\n(LP) relaxation from two perspectives: we propose two approximation algorithms\nand investigate the integrality gap. Our findings indicate that the\napproximation upper bounds for our algorithms are not significantly improvable\nthrough LP rounding, a notable distinction from the standard set cover problem.\nAdditionally, we discover that a generalization of the TCAND problem is\nequivalent to a variant of the set cover problem, named red-blue set\ncover~\\cite{carr1999red}, which cannot be approximated within a sub-polynomial\nfactor in polynomial time under plausible\nconjectures~\\cite{chlamtavc2023approximating}. Despite the extensive history\nsurrounding the issue of identifying the least cardinality candidate key, our\nresearch contributes new theoretical insights, novel algorithms, and\ndemonstrates that the general TCAND problem poses complexities beyond those\nencountered in the set cover problem.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Databases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.13540","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Functional dependencies (FDs) are a central theme in databases, playing a
major role in the design of database schemas and the optimization of queries.
In this work, we introduce the {\it targeted least cardinality candidate key
problem} (TCAND). This problem is defined over a set of functional dependencies
$F$ and a target variable set $T \subseteq V$, and it aims to find the smallest
set $X \subseteq V$ such that the FD $X \to T$ can be derived from $F$. The
TCAND problem generalizes the well-known NP-hard problem of finding the least
cardinality candidate key~\cite{lucchesi1978candidate}, which has been
previously demonstrated to be at least as difficult as the set cover problem. We present an integer programming (IP) formulation for the TCAND problem,
analogous to a layered set cover problem. We analyze its linear programming
(LP) relaxation from two perspectives: we propose two approximation algorithms
and investigate the integrality gap. Our findings indicate that the
approximation upper bounds for our algorithms are not significantly improvable
through LP rounding, a notable distinction from the standard set cover problem.
Additionally, we discover that a generalization of the TCAND problem is
equivalent to a variant of the set cover problem, named red-blue set
cover~\cite{carr1999red}, which cannot be approximated within a sub-polynomial
factor in polynomial time under plausible
conjectures~\cite{chlamtavc2023approximating}. Despite the extensive history
surrounding the issue of identifying the least cardinality candidate key, our
research contributes new theoretical insights, novel algorithms, and
demonstrates that the general TCAND problem poses complexities beyond those
encountered in the set cover problem.