Manolis Terrovitis, Panos Vassiliadis, Spiros Skiadopoulos, E. Bertino, B. Catania, Anna Maddalena
{"title":"Modeling and language support for the management of pattern-bases","authors":"Manolis Terrovitis, Panos Vassiliadis, Spiros Skiadopoulos, E. Bertino, B. Catania, Anna Maddalena","doi":"10.1109/SSDBM.2004.54","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.54","url":null,"abstract":"In our days knowledge extraction methods are able to produce artifacts (also called patterns) that concisely represent data. Patterns are usually quite heterogeneous and require ad-hoc processing techniques. So far, little emphasis has been posed on developing an overall integrated environment for uniformly representing and querying different types of patterns. Within the larger context of modelling, storing, and querying patterns, in this paper, we: (a) formally define the logical foundations for the global setting of pattern management through a model that covers data, patterns and their intermediate mappings; (b) present a pattern specification language for pattern management along with safety restrictions; and (c) introduce queries and query operators and identify interesting query classes.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"116 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126912030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge discovery from databases on the semantic Web","authors":"B. Scotney, S. McClean","doi":"10.1109/SSDBM.2004.45","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.45","url":null,"abstract":"We provide a flexible method for knowledge discovery from semantically heterogeneous data, based on the specification of ontology mappings from the local data sources to pre-existing (superior) ontologies in an ontology server. We also provide an innovative method for the construction of a dynamic shared ontology; data integration is then carried out by minimisation of the Kullback-Leibler information divergence using the EM algorithm. The new knowledge learned by this process is potentially richer than any of the contributing data sources. We also show how the approach may be extended to knowledge discovery from a number of database attributes; association rules or Bayesian belief networks may then be induced. An architecture for a KDD system in such an environment is described; this is an extension of a previous architecture for distributed data processing that we have already implemented.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122113333","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
I. Altintas, Chad Berkley, Efrat Jaeger, Matthew B. Jones, Bertram Ludäscher, S. Mock
{"title":"Kepler: an extensible system for design and execution of scientific workflows","authors":"I. Altintas, Chad Berkley, Efrat Jaeger, Matthew B. Jones, Bertram Ludäscher, S. Mock","doi":"10.1109/SSDBM.2004.44","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.44","url":null,"abstract":"Most scientists conduct analyses and run models in several different software and hardware environments, mentally coordinating the export and import of data from one environment to another. The Kepler scientific workflow system provides domain scientists with an easy-to-use yet powerful system for capturing scientific workflows (SWFs). SWFs are a formalization of the ad-hoc process that a scientist may go through to get from raw data to publishable results. Kepler attempts to streamline the workflow creation and execution process so that scientists can design, execute, monitor, re-run, and communicate analytical procedures repeatedly with minimal effort. Kepler is unique in that it seamlessly combines high-level workflow design with execution and runtime interaction, access to local and remote data, and local and remote service invocation. SWFs are superficially similar to business process workflows but have several challenges not present in the business workflow scenario. For example, they often operate on large, complex and heterogeneous data, can be computationally intensive and produce complex derived data products that may be archived for use in reparameterized runs or other workflows. Moreover, unlike business workflows, SWFs are often dataflow-oriented as witnessed by a number of recent academic systems (e.g., DiscoveryNet, Taverna and Triana) and commercial systems (Scitegic/Pipeline-Pilot, Inforsense). In a sense, SWFs are often closer to signal-processing and data streaming applications than they are to control-oriented business workflow applications.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129706543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multiscale classification of moving objects trajectories","authors":"C. Mouza, P. Rigaux","doi":"10.1109/SSDBM.2004.55","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.55","url":null,"abstract":"In this paper we propose a classification model for moving objects trajectories. We assume that the classification is based on a multiscale map, and we simply define a trajectory pattern as the sequence of zones an object crosses during its travel. These patterns constitute the basis of classification operators. We also define a pattern-based query language which allows an online and continuous classification of moving objects. Finally a prototype which shows the validity of the approach is briefly described.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129961882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hierarchical stream aggregates: querying nested stream sessions","authors":"Damianos Chatziantoniou, A. Anagnostopoulos","doi":"10.1109/SSDBM.2004.40","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.40","url":null,"abstract":"This article identifies an interesting class of applications where stream sessions may be organized into a hierarchical fashion - i.e. sessions may consist of sub-sessions. We argue that data streams of this kind have rich procedural semantics - i.e. behavior - and therefore a semantically rich model should be used: a session may be defined by opening and closing conditions, may have data and methods and may consist of sub-sessions. We propose a simple conceptual model based on the notion of \"session\" similar to a class in an object-oriented environment having lifetime semantics. Queries on top of this schema can be formulated via HSA (hierarchical stream aggregate) expressions.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131029875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BASS: approximate search on large string databases","authors":"Jiong Yang, Wei Wang, Philip S. Yu","doi":"10.1109/SSDBM.2004.20","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.20","url":null,"abstract":"In this paper, we study the problem on how to build an index structure for large string databases to efficiently support various types of string matching without the necessity of mapping the substrings to a numerical space (e.g., string B-tree and MRS-index) nor the restriction of in-memory practice (e.g., suffix tree and suffix array). Towards this goal, we propose a new indexing scheme, BASS-tree, to efficiently support general approximate substring match (in terms of certain symbol substitutions and misalignments) in sublinear time on a large string database. The key idea behind the design is that all positions in each string are grouped recursively into a fully balanced tree according to the similarities of the subsequent segments starting at those positions. Each node is labeled with a regular expression that describes the commonality of the substrings indexed through the subtree. Any search can then be properly directed to the portion in the database with a high potential of matching quickly. With the BASS-tree in place, wild card(s) in the query pattern can also be handled in a seamless way. In addition, search of a long pattern can be decomposed into a series of searches of short segments followed by a process to join the results. It has been demonstrated in our experiments that the potential performance improvement brought by BASS-tree is in an order of magnitude over alternative methods.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126053445","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vasilis Vasaitis, A. Nanopoulos, Panayiotis Bozanis
{"title":"Merging R-trees","authors":"Vasilis Vasaitis, A. Nanopoulos, Panayiotis Bozanis","doi":"10.1109/SSDBM.2004.50","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.50","url":null,"abstract":"R-trees, since their introduction in 1984, have been proven to be one of the most well-behaved practical data structures for accommodating dynamic massive sets of geometric objects and conducting a diverse set of queries on such data-sets in real-world applications. In this paper we introduce a new technique for merging two R-trees into a new one of very good quality. Our method avoids both the employment of bulk insertions and the solution of bulk-loading, from scratch, the new tree using the data of the original trees. Additionally, unlike previous approaches, it does not make any assumptions about data-set distributions. Experimental results provide evidence on the runtime efficiency of our method and illustrate the good query performance of the produced indices.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129616311","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Discovery of serial episodes from streams of events","authors":"T. Mielikainen","doi":"10.1109/SSDBM.2004.30","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.30","url":null,"abstract":"A very important problem in data mining is finding patterns from sequential data. There is a vast number of sources for sequential data such as biological sequences, text documents, telecommunication alarm sequences, click streams, market basket data, Web logs, and other time series. One of the most popular patterns mined from sequential data are the episodes, i.e., directed acyclic graphs with labeled nodes (Mannila et al., 1997), An important subclass of episodes are the serial episodes, which are essentially sequences. Serial episodes are useful in many applications, including network monitoring and molecular biology. Currently, there are many situations were so much sequential data is produced that it cannot even be stored without great difficulties. That kind of sequential sources are called data streams. In this paper we focus on finding serial episodes from data streams. To the best of our knowledge the problem of mining serial episodes from data streams has been studied in depth only for length-1 episodes (Karp et al., 2003).","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120947553","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mining deviants in time series data streams","authors":"S. Muthukrishnan, R. Shah, J. Vitter","doi":"10.1109/SSDBM.2004.51","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.51","url":null,"abstract":"One of the central tasks in managing, monitoring and mining data streams is that of identifying outliers. There is a long history of study of various outliers in statistics and databases, and a recent focus on mining outliers in data streams. Here, we adopt the notion of \"deviants\" from Jagadish et al. (1999) as outliers. Deviants are based on one of the most fundamental statistical concept of standard deviation (or variance). Formally, deviants are defined based on a representation sparsity metric, i.e., deviants are values whose removal from the dataset leads to an improved compressed representation of the remaining items. Thus, deviants are not global maxima/minima, but rather these are appropriate local aberrations. Deviants are known to be of great mining value in time series databases. We present first-known algorithms for identifying deviants on massive data streams. Our algorithms monitor streams using very small space (polylogarithmic in data size) and are able to quickly find deviants at any instant, as the data stream evolves over time. For all versions of this problem - uni- vs multivariate time series, optimal vs near-optimal vs heuristic solutions, offline vs streaming - our algorithms have the same framework of maintaining a hierarchical set of candidate deviants that are updated as the time series data gets progressively revealed. We show experimentally using real network traffic data (SNMP aggregate time series) as well as synthetic data that our algorithm is remarkably accurate in determining the deviants.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"212 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122660265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spatial join for high-resolution objects","authors":"H. Kriegel, Peter Kunath, M. Pfeifle, M. Renz","doi":"10.1109/SSDBM.2004.64","DOIUrl":"https://doi.org/10.1109/SSDBM.2004.64","url":null,"abstract":"Modern database applications including computer-aided design (CAD), medical imaging, molecular biology, or multimedia information systems impose new requirements on efficient spatial query processing. One of the most common query types in spatial database management systems is the spatial join. In this paper, we investigate spatial join processing for two sets of very complex spatial objects. We present an approach that is based on a fast filter step performing the spatial join on simple primitives which conservatively approximate the objects. Our main attention is focused on the problem how to generate approximations adequate for high-resolution objects. In this paper, we introduce gray approximations as a general concept which helps to range between replicating and nonreplicating object approximations. The key idea of our approach is to build these replications based on statistical information taking the data distribution of the respective join-partner relation into account. Furthermore, our approach uses compression techniques for the effective storage and retrieval of the decomposed spatial objects. We demonstrate the benefits of our new method for the spatial intersection join on high resolution data. The experimental evaluation on real-world test data points out that our new concept accelerates the spatial intersection join considerably.","PeriodicalId":383615,"journal":{"name":"Proceedings. 16th International Conference on Scientific and Statistical Database Management, 2004.","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2004-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123337810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}