{"title":"基于熵的近似查询和数据立方体探索","authors":"Themis Palpanas, Nick Koudas","doi":"10.1109/SSDM.2001.938541","DOIUrl":null,"url":null,"abstract":"Much research has been devoted to the efficient computation of relational aggregations and specifically the efficient execution of the datacube operation. We consider the inverse problem, that of deriving (approximately) the original data from the aggregates. We motivate this problem in the context of two specific application areas, that of approximate query answering and data analysis. We propose a framework based on the notion of information entropy that enables us to estimate the original values in a data set, given only aggregated information about it. We also describe an alternate utility of the proposed framework, that enables us to identify values that deviate from the underlying data distribution, suitable for data mining purposes. Finally, we present a detailed performance study of the algorithms using both real and synthetic data, highlighting the benefits of our approach as well as the efficiency of the proposed solutions.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Entropy based approximate querying and exploration of datacubes\",\"authors\":\"Themis Palpanas, Nick Koudas\",\"doi\":\"10.1109/SSDM.2001.938541\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Much research has been devoted to the efficient computation of relational aggregations and specifically the efficient execution of the datacube operation. We consider the inverse problem, that of deriving (approximately) the original data from the aggregates. We motivate this problem in the context of two specific application areas, that of approximate query answering and data analysis. We propose a framework based on the notion of information entropy that enables us to estimate the original values in a data set, given only aggregated information about it. We also describe an alternate utility of the proposed framework, that enables us to identify values that deviate from the underlying data distribution, suitable for data mining purposes. Finally, we present a detailed performance study of the algorithms using both real and synthetic data, highlighting the benefits of our approach as well as the efficiency of the proposed solutions.\",\"PeriodicalId\":129323,\"journal\":{\"name\":\"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001\",\"volume\":\"43 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2001-07-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDM.2001.938541\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDM.2001.938541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Entropy based approximate querying and exploration of datacubes
Much research has been devoted to the efficient computation of relational aggregations and specifically the efficient execution of the datacube operation. We consider the inverse problem, that of deriving (approximately) the original data from the aggregates. We motivate this problem in the context of two specific application areas, that of approximate query answering and data analysis. We propose a framework based on the notion of information entropy that enables us to estimate the original values in a data set, given only aggregated information about it. We also describe an alternate utility of the proposed framework, that enables us to identify values that deviate from the underlying data distribution, suitable for data mining purposes. Finally, we present a detailed performance study of the algorithms using both real and synthetic data, highlighting the benefits of our approach as well as the efficiency of the proposed solutions.