{"title":"Entropy based approximate querying and exploration of datacubes","authors":"Themis Palpanas, Nick Koudas","doi":"10.1109/SSDM.2001.938541","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938541","url":null,"abstract":"Much research has been devoted to the efficient computation of relational aggregations and specifically the efficient execution of the datacube operation. We consider the inverse problem, that of deriving (approximately) the original data from the aggregates. We motivate this problem in the context of two specific application areas, that of approximate query answering and data analysis. We propose a framework based on the notion of information entropy that enables us to estimate the original values in a data set, given only aggregated information about it. We also describe an alternate utility of the proposed framework, that enables us to identify values that deviate from the underlying data distribution, suitable for data mining purposes. Finally, we present a detailed performance study of the algorithms using both real and synthetic data, highlighting the benefits of our approach as well as the efficiency of the proposed solutions.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121082127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evolutionary design and development of image meta-analysis environments based on object-relational database mediator technology","authors":"J. Fredriksson, P. Svensson","doi":"10.1109/SSDM.2001.938551","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938551","url":null,"abstract":"Discusses how emerging object-relational database mediator technology can be used to integrate academic freeware and commercial-off-the-shelf software components to create a sequence of gradually more complex and powerful, yet always syntactically and semantically homogeneous, database-centred image meta-analysis environments. We show how this may be done by defining and utilising a use-case-based evolutionary design and development process. This process allows subsystems to be produced largely independently by several small specialist subprojects, turning the system integration work into a high-level domain modelling task.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131415149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exploring the behavior of the spring ecosystem model using an object-oriented database system","authors":"D. Mikesell, J. Pfaltz","doi":"10.1109/SSDM.2001.938561","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938561","url":null,"abstract":"Landscape and ecosystem models have typically been viewed as \"black boxes\", that given a set of inputs yield a set of outputs. This view does not easily lend itself to investigations as to why a model behaves as it does, especially when multiple models are coupled together to create a larger model. We hypothesize that, for this type of investigation, a visual multimedia tool is necessary to gain insight into the temporal behavior of the model. In order to facilitate the exploration of the behavior of one particular well-known landscape model, we have coupled the model with an object oriented database system and a data plotting and visualization package. Using this system, we have created animations of the model's output and used them to discover interesting properties of the model. In our demonstration, we display these animations and discuss how such a system can be used as an aid to explore hypotheses about the model's behavior.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123559531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Chad Berkley, Matthew B. Jones, Jivka Bojilova, Dan Higgins
{"title":"Metacat: a schema-independent XML database system","authors":"Chad Berkley, Matthew B. Jones, Jivka Bojilova, Dan Higgins","doi":"10.1109/SSDM.2001.938549","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938549","url":null,"abstract":"The ecological sciences represent a challenging community from the perspective of scientific data management. Ecological data are collected by investigators who are spread out over a large geographic area and who use a wide variety of research protocols and data-handling techniques. The resulting heterogeneous data are stored in autonomous database systems that are dispersed throughout the ecological community. The Knowledge Network for Biocomplexity is seeking to address these issues through the use of structured metadata encoded in the Extensible Markup Language (XML). The main goal of this project has been to design and implement a schema-independent data storage system for XML which is called Metacat. Metacat uses a hybrid XML storage approach using a commercial relational DBMS back-end while still allowing any arbitrary XML document to be stored. This paper describes the Metacat XML data storage system and its relevance to scientific data management in the ecological sciences.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125516414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Papageorgiou, F. Pentaris, Eirini Theodorou, M. Vardaki, M. Petrakos
{"title":"Modeling statistical metadata","authors":"H. Papageorgiou, F. Pentaris, Eirini Theodorou, M. Vardaki, M. Petrakos","doi":"10.1109/SSDM.2001.938535","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938535","url":null,"abstract":"An object oriented statistical metadata model is presented, which can be used in building information systems providing metadata-guided, statistical data processing features. The semantics of the model are analyzed and a set of operators (transformations) is proposed that allows for the automatic manipulation of both data and metadata at the same time. We discuss the mathematical properties of these transformations, and subsequently as a case study, we demonstrate how a statistical office can use the presented framework to build a Web site offering ad hoc query capabilities to its data consumers.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"8 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114114053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Ontology negotiation between scientific archives","authors":"S. Bailin, W. Truszkowski","doi":"10.1109/SSDM.2001.938557","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938557","url":null,"abstract":"Describes an approach to ontology negotiation between information agents. Ontologies are declarative (data-driven) expressions of an agent's \"world\": the objects, operations, facts and rules that constitute the logical space within which an agent performs. Ontology negotiation enables agents to cooperate in performing a task, even if they are based on different ontologies. The process allows agents to discover ontology conflicts and then, though incremental interpretation, clarification and explanation, establish a common basis for communication with each other.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115196130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient disk allocation schemes for parallel retrieval of multidimensional grid data","authors":"Chung-Min Chen, R. Sinha, R. Bhatia","doi":"10.1109/SSDM.2001.938553","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938553","url":null,"abstract":"Declustering schemes enable parallel data retrieval by placing data blocks across multiple disk devices. Various declustering schemes have been proposed for multidimensional data to reduce the response time of range queries. However, efficient schemes, which must be easy to compute and provide good performance, are only known for a restricted number of disks and dimensions. In this paper, we propose a novel technique to construct efficient multidimensional declustering schemes, for any number of disks and dimensions. Simulation results show that the new schemes outperform the best previously-known non-exhaustive search-based multidimensional declustering schemes.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124261593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruixin Yang, Xinhua Deng, M. Kafatos, Changzhou Wang, X. Wang
{"title":"An XML-based distributed metadata server (DIMES) supporting Earth science metadata","authors":"Ruixin Yang, Xinhua Deng, M. Kafatos, Changzhou Wang, X. Wang","doi":"10.1109/SSDM.2001.938558","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938558","url":null,"abstract":"With explosively increasing volumes of remote sensing, modelling and other Earth science data available, and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently. Metadata has been recognized as a key technology to ease the searching and retrieval of Earth science data. In this paper, we discuss the DIMES (DIstributed MEtadata Server) prototype system. Designed to be flexible yet simple, DIMES uses XML to represent, store, retrieve and interoperate metadata in a distributed environment. DIMES accepts metadata in any well-formed XML format and thus assumes the \"tree\" semantics of metadata entries. Additional domain knowledge can be represented as specific links through XML's ID/IDREF mechanism. DIMES provides a number of mechanisms, including the \"nearest-neighbor search\", to navigate and to search metadata. Though started for the Earth science community, DIMES can be easily extended to serve scientific communities in other disciplines.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"214 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122655977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Clustering algorithms and validity measures","authors":"M. Halkidi, Yannis Batistakis, M. Vazirgiannis","doi":"10.1109/SSDM.2001.938534","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938534","url":null,"abstract":"Clustering aims at discovering groups and identifying interesting distributions and patterns in data sets. Researchers have extensively studied clustering since it arises in many application domains in engineering and social sciences. In the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. The paper surveys clustering methods and approaches available in the literature in a comparative way. It also presents the basic concepts, principles and assumptions upon which the clustering algorithms are based. Another important issue is the validity of the clustering schemes resulting from applying algorithms. This is also related to the inherent features of the data set under concern. We review and compare clustering validity measures available in the literature. Furthermore, we illustrate the issues that are under-addressed by the recent algorithms and we address new research directions.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"102 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122510040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Rewrite rules for quantified subqueries in a federated database","authors":"G. Kemp, P. Gray, Andreas R. Sjöstedt","doi":"10.1109/SSDM.2001.938546","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938546","url":null,"abstract":"Transforming queries for efficient execution is particularly important in federated database systems since a more efficient execution plan can require many fewer data requests to be sent to the component databases. Also, it is important to do as much as possible of the selection and processing close to where the data are stored, making best use of facilities provided by the federation's component database management systems. We address the problem of processing complex queries including quantifiers, which have to be executed against different databases in an expanding heterogeneous federation. This is done by transforming queries within a mediator for global query improvement, and within wrappers to make best use of the query processing capabilities of external databases. Our approach is based on pattern matching and query rewriting. We introduce a high level language for expressing rewrite rules declaratively, and demonstrate the use and flexibility of such rules in improving query performance for existentially quantified subqueries. Extensions to this language that allow generic rewrite rules to be expressed are also presented. The value of performing final transformations within a wrapper for a given remote database is shown in several examples that use AMOS II-an SQLS-like system.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131198020","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}