{"title":"Discovery and Application of Data Dependencies","authors":"Eduardo H. M. Pena, E. Almeida","doi":"10.5753/CTD.2021.15749","DOIUrl":null,"url":null,"abstract":"This work makes contributions that reach central problems in connection with data dependencies. The first problem regards the discovery of dependencies of high expressive power. We introduce an efficient algorithm for the discovery of denial constraints: a type of dependency that has enough expressive power to generalize other important types of dependencies and to express complex business rules. The second problem concerns the application of dependencies for improving data consistency. We present a modification for traditional dependency discovery approaches that enables the dependency discovery algorithms to return reliable results even if they run on data containing some inconsistent records. Also, we present a system for detecting violations of dependencies efficiently. Our extensive experimental evaluation shows that our system is up to three orders-of-magnitude faster than state-of-the-art solutions, especially for larger datasets and massive numbers of dependency violations. The last contribution in this work regards the application of dependencies in query optimization. We present a system for the automatic discovery and selection of functional dependencies. Our experimental evaluation shows that our system selects relevant functional dependencies that help reducing the overall query response time for various types of query workloads.","PeriodicalId":236085,"journal":{"name":"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)","volume":"265 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Anais do XXXIV Concurso de Teses e Dissertações da SBC (CTD-SBC 2021)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5753/CTD.2021.15749","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This work makes contributions that reach central problems in connection with data dependencies. The first problem regards the discovery of dependencies of high expressive power. We introduce an efficient algorithm for the discovery of denial constraints: a type of dependency that has enough expressive power to generalize other important types of dependencies and to express complex business rules. The second problem concerns the application of dependencies for improving data consistency. We present a modification for traditional dependency discovery approaches that enables the dependency discovery algorithms to return reliable results even if they run on data containing some inconsistent records. Also, we present a system for detecting violations of dependencies efficiently. Our extensive experimental evaluation shows that our system is up to three orders-of-magnitude faster than state-of-the-art solutions, especially for larger datasets and massive numbers of dependency violations. The last contribution in this work regards the application of dependencies in query optimization. We present a system for the automatic discovery and selection of functional dependencies. Our experimental evaluation shows that our system selects relevant functional dependencies that help reducing the overall query response time for various types of query workloads.