{"title":"Agents and databases: friends or foes?","authors":"P. Lockemann, R. Witte","doi":"10.1109/IDEAS.2005.8","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.8","url":null,"abstract":"On first glance agent technology seems more like a hostile intruder into the database world. On the other hand, the two could easily complement each other, since agents carry out information processes whereas databases supply information to processes. Nonetheless, to view agent technology from a database perspective seems to question some of the basic paradigms of database technology, particularly the premise of semantic consistency of a database. The paper argues that the ensuing uncertainty in distributed databases can be modelled by beliefs, and develops the basic concepts for adjusting peer-to-peer databases to the individual beliefs in single nodes and collective beliefs in the entire distributed database.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128294869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Privacy aware data generation for testing database applications","authors":"Xintao Wu, Chintan Sanghvi, Yongge Wang, Yuliang Zheng","doi":"10.1109/IDEAS.2005.45","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.45","url":null,"abstract":"Testing of database applications is of great importance. A significant issue in database application testing consists in the availability of representative data. In this paper, we investigate the problem of generating a synthetic database based on a-priori knowledge about a production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from the production database and then generate the synthetic data using model learnt. The generated data is valid and similar to real data in terms of statistical distribution, hence it can be used for functional and performance testing. As characteristics extracted may contain information which may be used by attacker to derive some confidential information about individuals, we present our disclosure analysis method which applies cell suppression technique for identity disclosure analysis and perturbation for value disclosure.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"236 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129904617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semantic query transformation using ontologies","authors":"Chokri Ben Necib, J. Freytag","doi":"10.1109/IDEAS.2005.51","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.51","url":null,"abstract":"Traditional approaches to query processing aim at rewriting a given query into another more efficient one that uses less time and/or resources during the execution. There by, the rewritten query must be equivalent to the initial one, i.e., it must provide the same result. However, rewriting queries in equivalent ways do not always satisfy the user's needs, in particular when the user does not receive any answer at all. In this paper, we propose a new approach for query processing which allows to rewrite a query into another one which is not necessary equivalent but can provide more meaningful result satisfying the user's intention. For this purpose, we illustrate how semantic knowledge inform of ontologies could be effectively used. We develop a set of rewriting rules which rely on semantic information extracted form the ontology associated with the database. In addition, we discuss features of the necessary mappings between the ontology and its underlying database.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114307903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"XML and relational data: towards a common model and algebra","authors":"Matteo Magnani, D. Montesi","doi":"10.1109/IDEAS.2005.55","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.55","url":null,"abstract":"In this paper we present a model for the management of relational, XML, and mixed data. The main high-level approaches to manipulate XML, i.e., SQL/XML, XQuery, and object/relational XML columns, can all be based on our common model and algebra. Our query algebra, yet very simple, can represent queries not expressible by other proposals and by the current implementation of TAX. Moreover, we show that relational-like logical query rewriting can be extended to our algebraic expressions.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132505009","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NP Datalog: a logic language for NP search and optimization queries","authors":"S. Greco, I. Trubitsyna, E. Zumpano","doi":"10.1109/IDEAS.2005.38","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.38","url":null,"abstract":"This paper presents a logic language, called NP Datalog for NP search and optimization problems. The 'search' language extends stratified Datalog with constraints and partition rules defining (nondeterministically) partition of relations. NP optimization problems are then formulated by adding a max (or min) construct to select the solution (stable model) which maximizes (resp., minimizes) the result of a polynomial function applied to the answer relation. We show that NP Datalog queries can be easily evaluated by translating them into ILOG programs which are next solved by means of the ILOG OPL Studio suite. To prove the effectiveness of our proposal, we have implemented a module, written in Sicstus Prolog, which takes in input a NP Datalog query and outputs an equivalent ILOG program. Several experiments comparing the computation of queries by different logic systems have been also performed.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"174 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133819067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Differencing data streams","authors":"S. Chawathe","doi":"10.1109/IDEAS.2005.21","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.21","url":null,"abstract":"We present external-memory algorithms for differencing large hierarchical datasets. Our methods are especially suited to streaming data with bounded differences. For input sizes m and n and maximum output (difference) size e, the I/O, RAM, and CPU costs of our algorithm rdiff are, respectively, m + n, 4e + 8, and O(MN). That is, given 4e + 8 blocks of RAM, our algorithm performs no I/O operations other than those required to read both inputs. We also present a variant of the algorithm that uses only four blocks of RAM, with I/O cost 8me + 18m + n + 6e + 5 and CPU cost O(MN).","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132082531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eugenio Cesario, Francesco Folino, G. Manco, L. Pontieri
{"title":"An incremental clustering scheme for duplicate detection in large databases","authors":"Eugenio Cesario, Francesco Folino, G. Manco, L. Pontieri","doi":"10.1109/IDEAS.2005.10","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.10","url":null,"abstract":"We propose an incremental algorithm for clustering duplicate tuples in large databases, which allows to assign any new tuple t to the cluster containing the database tuples which are most similar to t (and hence are likely to refer to the same real-world entity t is associated with). The core of the approach is a hash-based indexing technique that tends to assign highly similar objects to the same buckets. Empirical evaluation proves that the proposed method allows to gain considerable efficiency improvement over a state-of-art index structure for proximity searches in metric spaces.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124864331","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatically maintaining wrappers for Web sources","authors":"J. Raposo, A. Pan, M. Álvarez, Justo Hidalgo","doi":"10.1109/IDEAS.2005.13","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.13","url":null,"abstract":"A substantial subset of the Web data follows some kind of underlying structure. Nevertheless, HTML does not contain any schema or semantic information about the data it represents. A program able to provide software applications with a structured view of those semi-structured Web sources is usually called a wrapper. Wrappers are able to accept a query against the source and return a set of structured results, thus enabling applications to access Web data in a similar manner to that of information from databases. A significant problem in this approach arises because Web sources may experiment changes that invalidate the current wrappers. In this paper, we present novel heuristics and algorithms to address this problem. Our approach is based on collecting some query results during wrapper operation. Then, when the source changes, they are used to generate a set of labeled examples that are then provided as input to a wrapper induction algorithm able to regenerate the wrapper. We have tested our methods in several real-world Web data extraction domains, obtaining high accuracy in all the steps of the process.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133674217","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the intersection of XPath expressions","authors":"B. Hammerschmidt, Martin Kempa, V. Linnemann","doi":"10.1109/IDEAS.2005.39","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.39","url":null,"abstract":"XPath is a common language for selecting nodes in an XML document. XPath uses so called path expressions which describe a navigation path through semistructured data. In the last years some of the characteristics of XPath have been discussed. Examples include the containment of two XPath expressions p and p' (p /spl sube/ p'). To the best of our knowledge the intersection of two XPath expressions (p /spl cap/ p') has not been treated yet. The intersection of p and p' is the set that contains all XML nodes that are selected both by p and p'. In the context of indexes in XML databases the emptiness of the intersection of p and p' is a major issue when updating the index. In order to keep the index consistent to the indexed data, it has to be detected if an index that is defined upon p is affected by a modifying database operation with the path expression p'. In this paper, we introduce the intersection problem for XPath and give a motivation for its relevance. We present an efficient intersection algorithm for XPath expressions without the NOT operator that is based on finite automata. For expressions that contain the NOT operator the intersection problem becomes NP-complete leading to exponential computations in general. With an average case simulation we show that the NP-completeness is no significant limitation for most real-world database operations.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"82 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116726165","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Incremental methods for simple problems in time series: algorithms and experiments","authors":"Xiaojian Zhao, Xin Zhang, T. Neylon, D. Shasha","doi":"10.1109/IDEAS.2005.35","DOIUrl":"https://doi.org/10.1109/IDEAS.2005.35","url":null,"abstract":"A time series (or equivalently a data stream) consists of data arriving in time order. Single or multiple data streams arise in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves and as the number of sensors increases. So fast algorithms become ever more critical in order to distill knowledge from the data. This paper presents our recent work regarding the incremental computation of various primitives: windowed correlation, matching pursuit, sparse null space discovery and elastic burst detection. The incremental idea reflects the fact that recent data is more important than older data. Our StatStream system contains an implementation of these algorithms, permitting us to do empirical studies on both simulated and real data.","PeriodicalId":357591,"journal":{"name":"9th International Database Engineering & Application Symposium (IDEAS'05)","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2005-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122012506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}