K. Shaw, Elias Ioup, J. Sample, M. Abdelguerfi, Olivier Tabone
{"title":"Efficient Approximation of Spatial Network Queries using the M-Tree with Road Network Embedding","authors":"K. Shaw, Elias Ioup, J. Sample, M. Abdelguerfi, Olivier Tabone","doi":"10.1109/SSDBM.2007.11","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.11","url":null,"abstract":"Spatial networks, such as road systems, operate differently from normal geospatial systems because objects are constrained to locations on the network. Performing queries on spatial networks demands entirely different solutions. Most spatial queries make use of an R-Tree to process them efficiently. The M-Tree is a data tree index which is capable of indexing data in any metric space. The M-Tree index can replace the R-Tree index for spatial network queries, such as range and KNN queries. The difficulty is that the M-Tree is only as efficient as the distance algorithm used on the underlying objects. Most network distance algorithms, such as A*, are too slow to allow the M-Tree to operate efficiently on spatial networks. The truncated road network embedding (tRNE) maps the network into a higher dimensional space where any LP metric can be used to efficiently compute an accurate approximation of network distance. The M-Tree combined with tRNE creates an efficient index structure for computing spatial network queries. The M-Tree substantially outperforms network expansion, the most popular method of computing spatial network queries, when performing spatial network KNN and range queries.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125414061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Elke Achtert, C. Böhm, H. Kriegel, Peer Kröger, A. Zimek
{"title":"On Exploring Complex Relationships of Correlation Clusters","authors":"Elke Achtert, C. Böhm, H. Kriegel, Peer Kröger, A. Zimek","doi":"10.1109/SSDBM.2007.21","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.21","url":null,"abstract":"In high dimensional data, clusters often only exist in arbitrarily oriented subspaces of the feature space. In addition, these so-called correlation clusters may have complex relationships between each other. For example, a correlation cluster in a 1-D subspace (forming a line) may be enclosed within one or even several correlation clusters in 2-D superspaces (forming planes). In general, such relationships can be seen as a complex hierarchy that allows multiple inclusions, i.e. clusters may be embedded in several super-clusters rather than only in one. Obviously, uncovering the hierarchical relationships between the detected correlation clusters is an important information gain. Since existing approaches cannot detect such complex hierarchical relationships among correlation clusters, we propose the algorithm ERiC to tackle this problem and to visualize the result by means of a graph-based representation. In our experimental evaluation, we show that ERiC finds more information than state-of-the-art correlation clustering methods and outperforms existing competitors in terms of efficiency.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128050558","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Managing Scientific Data: New Challenges for Database Research","authors":"M. Winslett","doi":"10.1109/SSDBM.2007.18","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.18","url":null,"abstract":"The database research community's appetite for new applications has led to increased interest in the data management needs of scientists. This area encompasses a huge range of applications, extending from public repositories of observational data such as the popular Sloan Digital Sky Survey to one-of-a-kind runs of simulation codes crafted by individual scientists. In this talk, we will survey the most common data management needs found in the hard sciences, describe the new database research challenges that arise from these needs, and outline ways to address some of these challenges.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116466770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MonetDB/SQL Meets SkyServer: the Challenges of a Scientific Database","authors":"M. Ivanova, N. Nes, R. Goncalves, M. Kersten","doi":"10.1109/SSDBM.2007.19","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.19","url":null,"abstract":"This paper presents our experiences in porting the Sloan Digital Sky Survey(SDSS)/ SkyServer to the state-of- the-art open source database system MonetDB/SQL. SDSS acts as a well-documented benchmark for scientific database management. We have achieved a fully functional prototype for the personal SkyServer, to be downloaded from our site. The lessons learned are 1) the column store approach of MonetDB demonstrates a great potential in the world of scientific databases. However, the application also challenged the functionality of our implementation and revealed that a fully operational SQL environment is needed, e.g. including persistent stored modules; 2) the initial performance is competitive to the reference platform, MS SQL Server 2005, and 3) the analysis of SDSS query traces hints at several techniques to boost performance by utilizing repetitive behavior and zoom-in/zoom-out access patterns, that are currently not captured by the system.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128815029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effective Summarization of Multi-Dimensional Data Streams for Historical Stream Mining","authors":"Samer Nassar, J. Sander","doi":"10.1109/SSDBM.2007.32","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.32","url":null,"abstract":"We consider the following problem: given a very large data stream, a limited space to encode the stream, and a compression technique to compress the stream, retain the most important information from the distant past of the stream while at the same time retain high quality of the compressed information that is in the recent part of the stream to perform temporal analysis of the summarized information. Simple schemes for accumulating micro-clustering summaries of stream windows that have been previously proposed are very ineffective for solving this challenging task. We overcome the limitations of these schemes by first identifying spatial summaries that compress \"similar' regions in the data space, and reduce their space consumption using novel approximate spatio-temporal summaries. Second, we present policies for effectively utilizing the space budget and managing these novel approximate spatio-temporal summaries.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131866442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient Indexing of Heterogeneous Data Streams with Automatic Performance Configurations","authors":"K. Pu, Ying Zhu","doi":"10.1109/SSDBM.2007.33","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.33","url":null,"abstract":"We study the problem of indexing continuous data streams in which data are heterogeneous in structure. Such data streams arise naturally in many real-life scenarios such as sensor networks. Our index structure uses bitmap based techniques to efficiently sketch the structures to allow space-efficient lossless archiving of the data stream. It also allows very fast query processing on the archived data stream. Furthermore, our index structure adapts to structural evolutions of the stream to ensure good indexing and querying performance both in space and time. We developed a cost-based optimization framework so the indexing engine adjusts its configuration at run-time to adapt to changes in the data stream. By means of linear feedback controllers, structural clustering and steepest gradient ascent optimization, our indexing engine can achieve excellent performance without any human intervention.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122246789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On Efficient Processing of Subspace Skyline Queries on High Dimensional Data","authors":"Wen Jin, A. Tung, M. Ester, Jiawei Han","doi":"10.1109/SSDBM.2007.20","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.20","url":null,"abstract":"Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"49 2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130293608","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Brenton Louie, L. Detwiler, Nilesh N. Dalvi, Ron Shaker, P. Tarczy-Hornoch, Dan Suciu
{"title":"Incorporating Uncertainty Metrics into a General-Purpose Data Integration System","authors":"Brenton Louie, L. Detwiler, Nilesh N. Dalvi, Ron Shaker, P. Tarczy-Hornoch, Dan Suciu","doi":"10.1109/SSDBM.2007.36","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.36","url":null,"abstract":"There is a significant need for data integration capabilities in the scientific domain, which has manifested itself as products in the commercial world as well as academia. However, in our experiences in dealing with biological data it has become apparent to us that existing data integration products do not handle uncertainties in the data very well. This leads to systems that often produce an explosion of less relevant answers which subsequently leads to a loss of more relevant answers by overloading the user. How to incorporate functionality into data integration systems to properly handle uncertainties and make results more useful has become an important research question. In this paper we describe an enhanced general-purpose data integration system which incorporates uncertainty metrics within a formal probabilistic framework. Additionally, for evaluation purposes, we have implemented a use case scenario which utilizes biological data sources and performed a study which provides validation of system query results.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"182 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132128525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Window-Oblivious Join: A Data-Driven Memory Management Scheme for Stream Join","authors":"Ji Wu, K. Tan, Yongluan Zhou","doi":"10.1109/SSDBM.2007.43","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.43","url":null,"abstract":"Memory management is a critical issue in stream processing involving stateful operators such as join. Traditionally, the memory requirement for a stream join is query-driven: a query has to explicitly define a window for each (potentially unbounded) input. The window essentially bounds the size of the buffer allocated for that stream. However, outputs produced by such approach may not be desirable (if the window size is not part of the intended query semantic) due to the volatile input characteristics. We discover that when streams are ordered or partially ordered, it is possible to use a data-driven memory management scheme for improved performance. In this work, we present a novel data-driven memory management scheme, called Window-Oblivious Join (WO-Join), which adaptively adjusts the state buffer size according to the input characteristics. Our performance study shows that, compared to traditional Window-Join (W-Join), WO-Join is more robust with respect to the dynamic inputs and therefore produces higher quality results with lower memory costs.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"180 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122592825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Wavelet Density Estimators over Data Streams","authors":"C. Heinz, B. Seeger","doi":"10.1109/SSDBM.2007.28","DOIUrl":"https://doi.org/10.1109/SSDBM.2007.28","url":null,"abstract":"A variety of scientific and commercial applications requires an immediate analysis of transient data streams. Many approaches for analyzing data share the property that an estimation of the underlying data distribution is used as a fundamental building block. To estimate the density of a continuous data distribution, wavelet density estimation, a technique from the area of nonparametric statistics, is very appealing as it is theoretically well-founded and practically approved. For that reason, its application to data streams is highly promising; it provides a convenient way to analyze the characteristics of a stream. However, the heavy computational cost of wavelet density estimators renders their direct application to the streaming scenario impossible. In this work, we tackle this problem and present a novel approach to adaptive wavelet density estimators over data streams. Not only do our estimators meet the rigid processing requirements for data streams, they also adapt to changing system resources in a well-defined manner. A thorough experimental evaluation demonstrates the efficacy of our wavelet density estimators and shows their superiority to competing kernel- and histogram-based estimators.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115215124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}