{"title":"重新审视完整性约束:从精确到近似含义","authors":"Batya Kenig, Dan Suciu","doi":"10.46298/lmcs-18(1:5)2022","DOIUrl":null,"url":null,"abstract":"Integrity constraints such as functional dependencies (FD) and multi-valued\ndependencies (MVD) are fundamental in database schema design. Likewise,\nprobabilistic conditional independences (CI) are crucial for reasoning about\nmultivariate probability distributions. The implication problem studies whether\na set of constraints (antecedents) implies another constraint (consequent), and\nhas been investigated in both the database and the AI literature, under the\nassumption that all constraints hold exactly. However, many applications today\nconsider constraints that hold only approximately. In this paper we define an\napproximate implication as a linear inequality between the degree of\nsatisfaction of the antecedents and consequent, and we study the relaxation\nproblem: when does an exact implication relax to an approximate implication? We\nuse information theory to define the degree of satisfaction, and prove several\nresults. First, we show that any implication from a set of data dependencies\n(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most\nquadratic in the number of variables; when the consequent is an FD, the factor\ncan be reduced to 1. Second, we prove that there exists an implication between\nCIs that does not admit any relaxation; however, we prove that every\nimplication between CIs relaxes \"in the limit\". Then, we show that the\nimplication problem for differential constraints in market basket analysis also\nadmits a relaxation with a factor equal to 1. Finally, we show how some of the\nresults in the paper can be derived using the I-measure theory, which relates\nbetween information theoretic measures and set theory. Our results recover, and\nsometimes extend, previously known results about the implication problem: the\nimplication of MVDs and FDs can be checked by considering only 2-tuple\nrelations.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"26 1","pages":"18:1-18:20"},"PeriodicalIF":0.0000,"publicationDate":"2018-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Integrity Constraints Revisited: From Exact to Approximate Implication\",\"authors\":\"Batya Kenig, Dan Suciu\",\"doi\":\"10.46298/lmcs-18(1:5)2022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Integrity constraints such as functional dependencies (FD) and multi-valued\\ndependencies (MVD) are fundamental in database schema design. Likewise,\\nprobabilistic conditional independences (CI) are crucial for reasoning about\\nmultivariate probability distributions. The implication problem studies whether\\na set of constraints (antecedents) implies another constraint (consequent), and\\nhas been investigated in both the database and the AI literature, under the\\nassumption that all constraints hold exactly. However, many applications today\\nconsider constraints that hold only approximately. In this paper we define an\\napproximate implication as a linear inequality between the degree of\\nsatisfaction of the antecedents and consequent, and we study the relaxation\\nproblem: when does an exact implication relax to an approximate implication? We\\nuse information theory to define the degree of satisfaction, and prove several\\nresults. First, we show that any implication from a set of data dependencies\\n(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most\\nquadratic in the number of variables; when the consequent is an FD, the factor\\ncan be reduced to 1. Second, we prove that there exists an implication between\\nCIs that does not admit any relaxation; however, we prove that every\\nimplication between CIs relaxes \\\"in the limit\\\". Then, we show that the\\nimplication problem for differential constraints in market basket analysis also\\nadmits a relaxation with a factor equal to 1. Finally, we show how some of the\\nresults in the paper can be derived using the I-measure theory, which relates\\nbetween information theoretic measures and set theory. Our results recover, and\\nsometimes extend, previously known results about the implication problem: the\\nimplication of MVDs and FDs can be checked by considering only 2-tuple\\nrelations.\",\"PeriodicalId\":90482,\"journal\":{\"name\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"volume\":\"26 1\",\"pages\":\"18:1-18:20\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46298/lmcs-18(1:5)2022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46298/lmcs-18(1:5)2022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Integrity Constraints Revisited: From Exact to Approximate Implication
Integrity constraints such as functional dependencies (FD) and multi-valued
dependencies (MVD) are fundamental in database schema design. Likewise,
probabilistic conditional independences (CI) are crucial for reasoning about
multivariate probability distributions. The implication problem studies whether
a set of constraints (antecedents) implies another constraint (consequent), and
has been investigated in both the database and the AI literature, under the
assumption that all constraints hold exactly. However, many applications today
consider constraints that hold only approximately. In this paper we define an
approximate implication as a linear inequality between the degree of
satisfaction of the antecedents and consequent, and we study the relaxation
problem: when does an exact implication relax to an approximate implication? We
use information theory to define the degree of satisfaction, and prove several
results. First, we show that any implication from a set of data dependencies
(MVDs+FDs) can be relaxed to a simple linear inequality with a factor at most
quadratic in the number of variables; when the consequent is an FD, the factor
can be reduced to 1. Second, we prove that there exists an implication between
CIs that does not admit any relaxation; however, we prove that every
implication between CIs relaxes "in the limit". Then, we show that the
implication problem for differential constraints in market basket analysis also
admits a relaxation with a factor equal to 1. Finally, we show how some of the
results in the paper can be derived using the I-measure theory, which relates
between information theoretic measures and set theory. Our results recover, and
sometimes extend, previously known results about the implication problem: the
implication of MVDs and FDs can be checked by considering only 2-tuple
relations.