Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.最新文献_第4页

Self-deadlocks in disparate scientific data management systems 不同科学数据管理系统中的自死锁

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.63

F. Pentaris, Y. Ioannidis

引用次数: 0

Efficient query processing on relational data-partitioning index structures 关系数据分区索引结构的高效查询处理

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.32

H. Kriegel, Peter Kunath, M. Pfeifle, M. Renz

{"title":"Efficient query processing on relational data-partitioning index structures","authors":"H. Kriegel, Peter Kunath, M. Pfeifle, M. Renz","doi":"10.1109/SSDBM.2004.32","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.32","url":null,"abstract":"In contrast to space-partitioning index structures, data-partitioning index structures naturally adapt to the actual data distribution which results in a very good query response behavior. Besides efficient query processing, modern database applications including computer-aided design, medical imaging, or molecular biology require fully-fledged database management systems in order to guarantee industrial-strength. In this paper, we show how we can achieve efficient query processing on data-partitioning index structures within general purpose database systems. We reduce the navigational index traversal cost by using \"extended index range scans\". If a directory node is \"largely\" covered by the actual query, the recursive tree traversal for this node can beneficially be replaced by a scan on the leaf level of the index instead of navigating through the directory any longer. On the other hand, for highly selective queries, the index is used as usual. In this paper, we demonstrate the benefits of this idea for spatial collision queries on the relational R-tree. Our experiments with an Oracle9i database system show that our new approach outperforms common index structures and the sequential scan considerably.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122789036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A weight-based map matching method in moving objects databases 一种基于权重的移动目标数据库地图匹配方法

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.10

Huabei Yin, O. Wolfson

引用次数: 143

MDDQL-Stat: data querying and analysis through integration of intentional and extensional semantics MDDQL-Stat:通过集成有意语义和扩展语义进行数据查询和分析

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.49

E. Kapetanios, David Baer, Björn Glaus, Paul Groenewoud

{"title":"MDDQL-Stat: data querying and analysis through integration of intentional and extensional semantics","authors":"E. Kapetanios, David Baer, Björn Glaus, Paul Groenewoud","doi":"10.1109/SSDBM.2004.49","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.49","url":null,"abstract":"We would like to present a prototype system enabling a rather empirical than a formal approach to the problem of posing queries to a semantically rich (quality aspects, semantic distance, etc.) data integration system {G,S,M} (Global schema, Sources, Mediation) through integration not only of intensional but also of extensional semantics. While the first is provided by an alphabet A as given by an ontology based global schema C, and a high level query language (conjunction/disjunction + inequalities + statistical operations), the latter enables synthesizing of data source specific and previously transformed query results according to well-defined set operations for heterogeneous, distributed data sources. Our approach contrasts with other GAV (Global-As-View) related architectures for mediation of integrated read-only views, in that it simplifies query processing while preserving flexibility when adding new data sources, despite the inherited complexity of mappings due to enhanced semantic description of data (semantic distance, quality parameters, etc.) such that statistical results and comparisons become more meaningful.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127332349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 9

AutoPart: automating schema design for large scientific databases using data partitioning AutoPart:使用数据分区自动化大型科学数据库的模式设计

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.19

Stratos Papadomanolakis, A. Ailamaki

{"title":"AutoPart: automating schema design for large scientific databases using data partitioning","authors":"Stratos Papadomanolakis, A. Ailamaki","doi":"10.1109/SSDBM.2004.19","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.19","url":null,"abstract":"Database applications that use multi-terabyte datasets are becoming increasingly important for scientific fields such as astronomy and biology. Scientific databases are particularly suited for the application of automated physical design techniques, because of their data volume and the complexity of the scientific workloads. Current automated physical design tools focus on the selection of indexes and materialized views. In large-scale scientific databases, however the data volume and the continuous insertion of new data allows for only limited indexes and materialized views. By contrast, data partitioning does not replicate data, thereby reducing space requirements and minimizing update overhead. In this paper we present AutoPart, an algorithm that automatically partitions database tables to optimize sequential access assuming prior knowledge of a representative workload. The resulting schema is indexed using a fraction of the space required for indexing the original schema. To evaluate AutoPart we built an automated schema design tool that interfaces to commercial database systems. We experiment with AutoPart in the context of the Sloan Digital Sky Survey database, a real-world astronomical database, running on SQL Server 2000. Our experiments demonstrate the benefits of partitioning for large-scale systems: partitioning alone improves query execution performance by a factor of two on average. Combined with indexes, the new schema also outperforms the indexed original schema by 20% (for queries) and a factor of five (for updates), while using only half the original index space.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125102143","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 154

A scalable approach to approximating aggregate queries over intermittent streams 在间歇流上近似聚合查询的可伸缩方法

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.6

Shanzhong Zhu, C. Ravishankar

引用次数: 5

A fast algorithm for subspace clustering by pattern similarity 基于模式相似度的子空间聚类快速算法

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.3

Haixun Wang, F. Chu, W. Fan, Philip S. Yu, J. Pei

{"title":"A fast algorithm for subspace clustering by pattern similarity","authors":"Haixun Wang, F. Chu, W. Fan, Philip S. Yu, J. Pei","doi":"10.1109/SSDBM.2004.3","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.3","url":null,"abstract":"Unlike traditional clustering methods that focus on grouping objects with similar values on a set of dimensions, clustering by pattern similarity finds objects that exhibit a coherent pattern of rise and fall in subspaces. Pattern-based clustering extends the concept of traditional clustering and benefits a wide range of applications, including large scale scientific data analysis, target marketing, Web usage analysis, etc. However, state-of-the-art pattern-based clustering methods (e.g., the pCluster algorithm) can only handle data sets of thousands of records, which makes them inappropriate for many real-life applications. Furthermore, besides the huge data volume, many data sets are also characterized by their sequentiality, for instance, customer purchase records and network event logs are usually modeled as data sequences. Hence, it becomes important to enable pattern-based clustering methods i) to handle large datasets, and ii) to discover pattern similarity embedded in data sequences. In this paper, we present a novel algorithm that offers this capability. Experimental results from both real life and synthetic datasets prove its effectiveness and efficiency.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"139 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121960808","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 43

SaIL: a library for efficient application integration of spatial indices SaIL:一个用于空间索引高效应用集成的库

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.60

Marios Hadjieleftheriou, E. Hoel, V. Tsotras

引用次数: 3

HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms HybridTreeMiner:一个使用规范形式挖掘频繁根树和自由树的高效算法

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.41

Yun Chi, Yirong Yang, R. Muntz

{"title":"HybridTreeMiner: an efficient algorithm for mining frequent rooted trees and free trees using canonical forms","authors":"Yun Chi, Yirong Yang, R. Muntz","doi":"10.1109/SSDBM.2004.41","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.41","url":null,"abstract":"Tree structures are used extensively in domains such as computational biology, pattern recognition, XML databases, computer networks, and so on. In this paper, we present HybridTreeMiner, a computationally efficient algorithm that discovers all frequently occurring subtrees in a database of rooted unordered trees. The algorithm mines frequent subtrees by traversing an enumeration tree that systematically enumerates all subtrees. The enumeration tree is defined based on a novel canonical form for rooted unordered trees - the breadth-first canonical form (BFCF). By extending the definitions of our canonical form and enumeration tree to free trees, our algorithm can efficiently handle databases of free trees as well. We study the performance of our algorithms through extensive experiments based on both synthetic data and datasets from real applications. The experiments show that our algorithm is competitive in comparison to known rooted tree mining algorithms and is faster by one to two orders of magnitudes compared to a known algorithm for mining frequent free trees.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"2015 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132929912","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 146

A shrinking-based dimension reduction approach for multi-dimensional analysis 一种基于收缩的多维分析降维方法

Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004. Pub Date : 2004-06-21 DOI: 10.1109/SSDBM.2004.8

Yong Shi, A. Zhang

{"title":"A shrinking-based dimension reduction approach for multi-dimensional analysis","authors":"Yong Shi, A. Zhang","doi":"10.1109/SSDBM.2004.8","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.8","url":null,"abstract":"In this paper, we present continuous research on data analysis based on our previous work on the shrinking approach. Shrinking is a novel data preprocessing technique which optimizes the inner structure of data inspired by the Newton's Universal Law of Gravitation in the real world. It can be applied in many data mining fields. Following our previous work on the shrinking method for multidimensional data analysis in full data space, we propose a shrinking-based dimension reduction approach which tends to solve the dimension reduction problem from a new perspective. In this approach data are moved along the direction of the density gradient, thus making the inner structure of data more prominent. It is conducted on a sequence of grids with different cell sizes. Dimension reduction process is performed based on the difference of the data distribution projected on each dimension before and after the data-shrinking process. Those dimensions with dramatic variation of data distribution through the data-shrinking process are selected as good dimension candidates for further data analysis. This approach can assist to improve the performance of existing data analysis approaches. We demonstrate how this shrinking-based dimension reduction approach affects the clustering results of well known algorithms.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124936116","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 10