{"title":"MM-Cubing:通过分解晶格空间来计算冰山立方体","authors":"Zheng Shao, Jiawei Han, Dong Xin","doi":"10.1109/SSDBM.2004.53","DOIUrl":null,"url":null,"abstract":"The data cube and iceberg cube computation problem has been studied by many researchers. There are three major approaches developed in this direction: (1) top-down computation, represented by MultiWay array aggregation (Zhao et. al., 1997) which utilizes shared computation and performs well on dense data sets; (2) bottom-up computation, represented by BUC (Beyer and Ramakrishnan, 1999), which takes advantage of Apriori Pruning and performs well on sparse data sets; and (3) integrated top-down and bottom-up computation, represented by Star-Cubing (Xin, et. al., 2003), which takes advantages of both and has high performance in most cases. However; the performance of Star-Cubing degrades in very sparse data sets due to the additional cost introduced by the tree structure. None of the three approaches achieves uniformly high performance on all kinds of data sets. In this paper; we present a new approach that compute Iceberg Cubes by factorizing the lattice space according to the frequency of values. This approach, different from all the previous dimension-based approaches where the importance of data distribution is not recognized, partitions the cube lattice into one dense subspace and several sparse subspaces. With this approach, a new method called MM-Cubing has been developed. MM-Cubing is highly adaptive to dense, sparse or skewed data sets. Our performance study shows that MM-Cubing is efficient and achieves high performance over all kinds of data distributions.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"50","resultStr":"{\"title\":\"MM-Cubing: computing Iceberg cubes by factorizing the lattice space\",\"authors\":\"Zheng Shao, Jiawei Han, Dong Xin\",\"doi\":\"10.1109/SSDBM.2004.53\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data cube and iceberg cube computation problem has been studied by many researchers. There are three major approaches developed in this direction: (1) top-down computation, represented by MultiWay array aggregation (Zhao et. al., 1997) which utilizes shared computation and performs well on dense data sets; (2) bottom-up computation, represented by BUC (Beyer and Ramakrishnan, 1999), which takes advantage of Apriori Pruning and performs well on sparse data sets; and (3) integrated top-down and bottom-up computation, represented by Star-Cubing (Xin, et. al., 2003), which takes advantages of both and has high performance in most cases. However; the performance of Star-Cubing degrades in very sparse data sets due to the additional cost introduced by the tree structure. None of the three approaches achieves uniformly high performance on all kinds of data sets. In this paper; we present a new approach that compute Iceberg Cubes by factorizing the lattice space according to the frequency of values. This approach, different from all the previous dimension-based approaches where the importance of data distribution is not recognized, partitions the cube lattice into one dense subspace and several sparse subspaces. With this approach, a new method called MM-Cubing has been developed. MM-Cubing is highly adaptive to dense, sparse or skewed data sets. Our performance study shows that MM-Cubing is efficient and achieves high performance over all kinds of data distributions.\",\"PeriodicalId\":383615,\"journal\":{\"name\":\"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-06-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"50\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDBM.2004.53\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2004.53","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 50
摘要
数据立方体和冰山立方体的计算问题已经被许多研究者研究过。在这个方向上有三种主要的方法:(1)自上而下的计算,以多路数组聚合(Zhao et. al., 1997)为代表,它利用共享计算,在密集数据集上表现良好;(2)自下而上计算,以BUC (Beyer and Ramakrishnan, 1999)为代表,利用Apriori Pruning,在稀疏数据集上表现良好;(3)自上而下和自下而上相结合的计算,以Star-Cubing (Xin, et al., 2003)为代表,利用了两者的优势,在大多数情况下具有较高的性能。然而;由于树形结构带来的额外成本,星立方算法在非常稀疏的数据集中的性能会下降。这三种方法都无法在所有类型的数据集上实现一致的高性能。在本文中;本文提出了一种计算冰山立方体的新方法,即根据值的频率对晶格空间进行因式分解。该方法不同于以往所有基于维度的方法,即不认识数据分布的重要性,它将立方体晶格划分为一个密集子空间和几个稀疏子空间。利用这种方法,一种叫做mm立方的新方法被开发出来。mm - cube对密集、稀疏或倾斜的数据集具有很高的适应性。我们的性能研究表明,MM-Cubing是高效的,并且在所有类型的数据分布中都实现了高性能。
MM-Cubing: computing Iceberg cubes by factorizing the lattice space
The data cube and iceberg cube computation problem has been studied by many researchers. There are three major approaches developed in this direction: (1) top-down computation, represented by MultiWay array aggregation (Zhao et. al., 1997) which utilizes shared computation and performs well on dense data sets; (2) bottom-up computation, represented by BUC (Beyer and Ramakrishnan, 1999), which takes advantage of Apriori Pruning and performs well on sparse data sets; and (3) integrated top-down and bottom-up computation, represented by Star-Cubing (Xin, et. al., 2003), which takes advantages of both and has high performance in most cases. However; the performance of Star-Cubing degrades in very sparse data sets due to the additional cost introduced by the tree structure. None of the three approaches achieves uniformly high performance on all kinds of data sets. In this paper; we present a new approach that compute Iceberg Cubes by factorizing the lattice space according to the frequency of values. This approach, different from all the previous dimension-based approaches where the importance of data distribution is not recognized, partitions the cube lattice into one dense subspace and several sparse subspaces. With this approach, a new method called MM-Cubing has been developed. MM-Cubing is highly adaptive to dense, sparse or skewed data sets. Our performance study shows that MM-Cubing is efficient and achieves high performance over all kinds of data distributions.