{"title":"COCA:从更精确的相关性检测中得到更精确的多维直方图","authors":"Wei Cao, Xiongpai Qin, Shan Wang","doi":"10.1109/WAIM.2008.21","DOIUrl":null,"url":null,"abstract":"Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, namely, entropy correlation coefficients, other than chi-square test to detect correlations among columns in large datasets. We introduce a novel yet simple kind of multi-dimensional synopses named COCA-Hist to cope with different correlations in databases. With the aid of the precise metric of entropy correlation coefficients, correlations of various degrees can be detected effectively; when correlation coefficients testify to mutual independence among columns, the AVI (attribute value independence) assumption can be adopted undoubtedly. COCA can also serve as a data-mining tool with superior qualities as CORDS does. We demonstrate the effectiveness and accuracy of our approach by several experiments.","PeriodicalId":217119,"journal":{"name":"2008 The Ninth International Conference on Web-Age Information Management","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-07-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"COCA: More Accurate Multidimensional Histograms out of More Accurate Correlations Detection\",\"authors\":\"Wei Cao, Xiongpai Qin, Shan Wang\",\"doi\":\"10.1109/WAIM.2008.21\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, namely, entropy correlation coefficients, other than chi-square test to detect correlations among columns in large datasets. We introduce a novel yet simple kind of multi-dimensional synopses named COCA-Hist to cope with different correlations in databases. With the aid of the precise metric of entropy correlation coefficients, correlations of various degrees can be detected effectively; when correlation coefficients testify to mutual independence among columns, the AVI (attribute value independence) assumption can be adopted undoubtedly. COCA can also serve as a data-mining tool with superior qualities as CORDS does. We demonstrate the effectiveness and accuracy of our approach by several experiments.\",\"PeriodicalId\":217119,\"journal\":{\"name\":\"2008 The Ninth International Conference on Web-Age Information Management\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-07-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 The Ninth International Conference on Web-Age Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WAIM.2008.21\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 The Ninth International Conference on Web-Age Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WAIM.2008.21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
COCA: More Accurate Multidimensional Histograms out of More Accurate Correlations Detection
Detecting and exploiting correlations among columns in relational databases are of great value for query optimizers to generate better query execution plans (QEPs). We propose a more robust and informative metric, namely, entropy correlation coefficients, other than chi-square test to detect correlations among columns in large datasets. We introduce a novel yet simple kind of multi-dimensional synopses named COCA-Hist to cope with different correlations in databases. With the aid of the precise metric of entropy correlation coefficients, correlations of various degrees can be detected effectively; when correlation coefficients testify to mutual independence among columns, the AVI (attribute value independence) assumption can be adopted undoubtedly. COCA can also serve as a data-mining tool with superior qualities as CORDS does. We demonstrate the effectiveness and accuracy of our approach by several experiments.