Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.最新文献_第6页

Knowledge Sifter: ontology-driven search over heterogeneous databases 知识筛选器:异构数据库的本体驱动搜索

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.46

L. Kerschberg, Mizan Chowdhury, A. Damiano, Hanjo Jeong, Scott Mitchell, Jingwei Si, Stephen Smith

引用次数: 24

East of Neuchatel: a universal model for the representation of statistical taxonomy systems 纳沙泰尔以东:统计分类系统表示的通用模型

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.31

M. Denk, K. Froeschl

引用次数: 1

Accessing and visualizing scientific spatiotemporal data 获取和可视化科学时空数据

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.11

D. Katz, Attila Bergou, G. Berriman, Gary L. Block, J. Collier, D. Curkendall, J. Good, L. Husman, J. Jacob, A. Laity, Peggy Li, C. Miller, T. Prince, H. Siegel, Roy Williams

引用次数: 4

An efficient method to find area clusters with constraints using grid index structure 一种利用网格索引结构查找约束区域簇的有效方法

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.14

Kwang-Su Yang, Ruixin Yang, Jiang Tang, M. Kafatos

引用次数: 0

MM-Cubing: computing Iceberg cubes by factorizing the lattice space MM-Cubing:通过分解晶格空间来计算冰山立方体

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.53

Zheng Shao, Jiawei Han, Dong Xin

{"title":"MM-Cubing: computing Iceberg cubes by factorizing the lattice space","authors":"Zheng Shao, Jiawei Han, Dong Xin","doi":"10.1109/SSDBM.2004.53","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.53","url":null,"abstract":"The data cube and iceberg cube computation problem has been studied by many researchers. There are three major approaches developed in this direction: (1) top-down computation, represented by MultiWay array aggregation (Zhao et. al., 1997) which utilizes shared computation and performs well on dense data sets; (2) bottom-up computation, represented by BUC (Beyer and Ramakrishnan, 1999), which takes advantage of Apriori Pruning and performs well on sparse data sets; and (3) integrated top-down and bottom-up computation, represented by Star-Cubing (Xin, et. al., 2003), which takes advantages of both and has high performance in most cases. However; the performance of Star-Cubing degrades in very sparse data sets due to the additional cost introduced by the tree structure. None of the three approaches achieves uniformly high performance on all kinds of data sets. In this paper; we present a new approach that compute Iceberg Cubes by factorizing the lattice space according to the frequency of values. This approach, different from all the previous dimension-based approaches where the importance of data distribution is not recognized, partitions the cube lattice into one dense subspace and several sparse subspaces. With this approach, a new method called MM-Cubing has been developed. MM-Cubing is highly adaptive to dense, sparse or skewed data sets. Our performance study shows that MM-Cubing is efficient and achieves high performance over all kinds of data distributions.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133919031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 50

A comparative study of spatial indexing techniques for multidimensional scientific datasets 多维科学数据集空间索引技术的比较研究

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.1

Beomseok Nam, A. Sussman

{"title":"A comparative study of spatial indexing techniques for multidimensional scientific datasets","authors":"Beomseok Nam, A. Sussman","doi":"10.1109/SSDBM.2004.1","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.1","url":null,"abstract":"Scientific applications that query into very large multidimensional datasets are becoming more common. These datasets are growing in size every day, and are becoming truly enormous, making it infeasible to index individual data elements. We have instead been experimenting with chunking the datasets to index them, grouping data elements into small chunks of a fixed, but dataset-specific, size to take advantage of spatial locality. While spatial indexing structures based on R-trees perform reasonably well for the rectangular bounding boxes of such chunked datasets, other indexing structures based on KDB-trees, such as Hybrid trees, have been shown to perform very well for point data. In this paper, we investigate how all these indexing structures perform for multidimensional scientific datasets, and compare their features and performance with that of SH-trees, an extension of Hybrid trees, for indexing multidimensional rectangles. Our experimental results show that the algorithms for building and searching SH-trees outperform those for R-trees, R*-trees, and X-trees for both real application and synthetic datasets and queries. We show that the SH-tree algorithms perform well for both low and high dimensional data, and that they scale well to high dimensions both for building and searching the trees.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"141 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131397677","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 32

COBBLER: combining column and row enumeration for closed pattern discovery COBBLER:结合列和行枚举，用于封闭模式发现

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.21

Feng Pan, A. Tung, G. Cong, Xin Xu

{"title":"COBBLER: combining column and row enumeration for closed pattern discovery","authors":"Feng Pan, A. Tung, G. Cong, Xin Xu","doi":"10.1109/SSDBM.2004.21","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.21","url":null,"abstract":"The problem of mining frequent closed patterns has received considerable attention recently as it promises to have much less redundancy compared to discovering all frequent patterns. Existing algorithms can presently be separated into two groups, feature (column) enumeration and row enumeration. Feature enumeration algorithms like CHARM and CLOSET+ are efficient for datasets with small number of features and large number of rows since the number of feature combinations to be enumerated is small. Row enumeration algorithms like CARPENTER on the other hand are more suitable for datasets (eg. bioinformatics data) with large number of features and small number of rows. Both groups of algorithms, however, will encounter problem for datasets that have large number of rows and features. In this paper, we describe a new algorithm called COBBLER which can efficiently mine such datasets . COBBLER is designed to dynamically switch between feature enumeration and row enumeration depending on the data characteristic in the process of mining. As such, each portion of the dataset can be processed using the most suitable method, making the mining more efficient. Several experiments on real-life and synthetic datasets show that COBBLER is an order of magnitude better than previous closed pattern mining algorithms like CHARM, CLOSET+ and CARPENTER.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114217776","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 64

Retrieval of isomorphic substructures in crystallographic databases 晶体学数据库中同构子结构的检索

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.59

H. Klein

引用次数: 4

Grid-based metadata services 网格元数据服务

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.39

E. Deelman, Gurmeet Singh, M. Atkinson, A. Chervenak, Neil Philippe Chue Hong, C. Kesselman, Sonal Patil, L. Pearlman, Mei-Hui Su

引用次数: 99

On the integration of autonomous data marts 关于自主数据集市的集成

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.57

L. Cabibbo, Riccardo Torlone

引用次数: 29