{"title":"A cost model for estimating the performance of spatial joins using R-trees","authors":"Yun-Wu Huang, N. Jing, Elke A. Rundensteiner","doi":"10.1109/SSDM.1997.621148","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621148","url":null,"abstract":"The development of a cost model for predicting the performance of spatial joins has been identified in the literature as an important and difficult problem. The authors present the first cost model that can predict the performance of spatial joins using R-trees. Based on two existing R-trees (join targets), the model first estimates the number of expected I/Os for the join process by assuming a zero buffer size. The method for this estimation extends the cost model for R-tree window queries (developed by Kamel and Faloutsos (1993) and by Pagel et al. (1993)) to also handle spatial joins (which are more complex). In the context of spatial join processing, this number of zero-buffer expected I/Os is not practical for performance prediction in a buffered environment. To model the buffer impact, they use an (exponential) distribution function to measure the probability that a bufferless I/O would cause a page fault in a buffered environment. Based on this probability and the zero-buffer expected I/O cost, the estimated number of I/Os for an R-tree join can then be computed. The comparisons between the predictions from the cost model and the actual results from the experiments based on real GIS maps show that the average relative error ratio is about 10% with a maximum of about 20% for a wide range of buffer sizes. Therefore, our model is a useful tool for the query optimization of spatial join queries.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121830820","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VANILLA: a dynamic data schema for a generic scientific database","authors":"Karla Massey, L. Kerschberg, George Michaels","doi":"10.1109/SSDM.1997.621163","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621163","url":null,"abstract":"Scientists from widely varying communities are frequently called upon to work together, sharing their data and their expertise to investigate a common issue. Difficulties frequently arise in sharing data because each community, and sometimes each scientist, has their own conventions for structuring the data. This results in a data schema that is incompatible with the other scientific communities working on the investigation. The paper presents VANILLA, a generalized data schema developed to address these issues, whose prototype shares data among the varied communities of forest canopy science. VANILLA uses techniques from semantic and dimensional databases in a federated approach to manage the sociological and technological issues in scientific data integration. All data is organized according to a thesaurus so that new data schemas can be rapidly built incorporating terminology and semantics of new scientific domains or methods. This data schema has been successfully used to store and analyze micro meteorological forest data and stem map data.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127917633","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
H. Geller, Sarah Conger, John Ertlschweiger, August J. Ryberg
{"title":"A prototype metadata database for online analytical processing of environmental data","authors":"H. Geller, Sarah Conger, John Ertlschweiger, August J. Ryberg","doi":"10.1109/SSDM.1997.621157","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621157","url":null,"abstract":"We present preliminary results on the development of a prototype database system demonstrating the utility of the integration of environmental metadata within an online analytical processing environment. We utilized existing data derived from CD-ROMs of the National Snow and Ice Data Center (NSIDC), the Consortium for International Earth Science Information Network (CIESIN) and the US Geological Survey (USGS). We populated a prototype metadata database whose architecture facilitates the scientific and statistical investigations of geophysical parameters associated with the polar regions, allowing for data fusion from other regions and Earth science disciplines, facilitating interdisciplinary studies. The user can extract information combining the knowledge of two disparate sources of geophysical data to allow a query that would result in a useful product. Furthermore, we demonstrate the utility of allowing access to this database via the World Wide Web using an interface to the underlying Oracle database management system.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125791574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"LOGOS: a computational framework for neuroinformatics research","authors":"Michael Stiber, G. Jacobs, D. Swanberg","doi":"10.1109/SSDM.1997.621190","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621190","url":null,"abstract":"Neuroinformatics presents a great challenge to the computer science community. Quantities of data currently range up to multiple-petabyte levels. The data itself are diverse, including scalar vector (from 1 to 4 dimensions), volumetric (up to 4 dimensional spatio-temporal), topological, and symbolic, structured knowledge. Spatial scales range from Angstroms to meters, while temporal scales go from microseconds to decades. Base data vary greatly from individual to individual, and results computed can change with improvements in algorithms, data collection techniques, or underlying methods. The authors describe a system for managing, sharing, processing, and visualizing such data. Envisioned as a \"researcher's associate\", it will facilitate collaboration, interface between researchers and data, and perform bookkeeping associated with the complete scientific information life cycle, from collection, analysis, and publication to review of previous results and the start of a new cycle.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116196508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Query pre-execution and batching in Paradise: a two-pronged approach to the efficient processing of queries on tape-resident raster images","authors":"Jie-Bing Yu, D. DeWitt","doi":"10.1109/SSDM.1997.621153","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621153","url":null,"abstract":"The focus of the Paradise project (D. DeWitt et al., 194; J. Patel et al., 1997) is to design and implement a scalable database system capable of storing and processing massive data sets such as those produced by NASA's EOSDIS project. The paper describes extensions to Paradise to handle the execution of queries involving collections of satellite images stored on tertiary storage. Several modifications were made to Paradise in order to make the execution of such queries both transparent to the user and efficient. First, the Paradise storage engine (the SHORE storage manager) was extended to support tertiary storage using a log structured organization for tape volumes. Second, the Paradise query processing engine was modified to incorporate a number of novel mechanisms including query pre execution, object abstraction, cache conscious tape scheduling, and query batching. A performance evaluation on a working prototype demonstrates that, together, these techniques can provide a dramatic improvement over more traditional approaches to the management of data stored on tape.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125671508","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Metadata: a case study from the environmental sciences","authors":"F. Bretherton, W. Hibbard","doi":"10.1109/SSDM.1997.621182","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621182","url":null,"abstract":"Environmental data are increasingly being processed automatically into derived producers which are then used for a variety of scientific purposes. Lack of adequate documentation of the quality control and transformation algorithms can seriously diminish product credibility and utility, particularly for studies of global change which require consistent information over many decades. Formal modeling of the concepts, algorithms, and data structures provides an approach for increasing the quality and reducing the burden of structuring and providing appropriate metadata. Pilot studies of seemingly simple examples have revealed significant conceptual issues such as the need for a science-based theory of approximate equivalence among different finite representations of the same physical object, and for a careful division of roles between scientists and software designers when implementing support for physical units as metadata.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"171 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132151464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Kerschberg, M. Kafatos, George Michaels, John Cherniasky
{"title":"Scientific Databases: A Challenge in Interdisciplinary Education","authors":"L. Kerschberg, M. Kafatos, George Michaels, John Cherniasky","doi":"10.1109/SSDM.1997.621193","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621193","url":null,"abstract":"The goal of this panel is to address the issues associated with establishing Scientific and Statistical Databases as an integral part of the educational curriculum within Academe. The purpose of this panel is to initiate a dialog among the SSDBM research community, government agencies and academe to discuss the challenges associated with specifying and implementing a Scientific Database Education (SDBE) Curriculum. The panelists bring experience in facilitating and teaching such courses through traditional and distance education models.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"2067 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129826261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Y. Chi, C. Mechoso, M. Stonebraker, K. Sklower, R. Troy, R. Muntz, E. Mesrobian
{"title":"ESMDIS: Earth System Model Data Information System","authors":"Y. Chi, C. Mechoso, M. Stonebraker, K. Sklower, R. Troy, R. Muntz, E. Mesrobian","doi":"10.1109/SSDM.1997.621169","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621169","url":null,"abstract":"The goal of the development of the Earth System Model Data Information System (ESMDIS) are to provide Earth scientists with: 1) an output management system of Earth System Model (ESM) to browse the metadata and retrieve a desired subset of ESM output; 2) an analysis system of ESM output and other related datasets; 3) an automated pipelining system for ESM data processing; 4) a visualization system; and 5) a Web based user interface to utilize the system. ESMDIS is based on DBMS centric approach, built upon the \"BigSur\" Earth science data schema, and developed using an object relational DBMS. We have built a prototype ESMDIS, and present the results of its development.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124731567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Knowledge discovery in an earthquake text database: correlation between significant earthquakes and the time of day","authors":"J. Goldman, D. S. Parker, W. Chu","doi":"10.1109/SSDM.1997.621144","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621144","url":null,"abstract":"The authors take a real world application from a text database and present a case history. The techniques ultimately led to a discovery contradicting an accepted paradigm in seismology. Using simple, tailored, keyword extraction, they examined a text collection of earthquake data. A discovery was made when an unusual pattern emerged from the text. They then tested a more comprehensive numerical database, treating the the text discovery as a hypothesis. It was verified using a standard /spl chi//sup 2/ statistic. The hypothesis was significant earthquakes in the longitude regions that include California, occur more often in the morning hours than any other time of day.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130917841","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A spatial data cube concept to support data analysis in environmental epidemiology","authors":"V. Kamp, L. Sitzmann, Frank Wietek","doi":"10.1109/SSDM.1997.621161","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621161","url":null,"abstract":"The project CARLOS (Cancer Registry Lower-Saxony) developed the Epidemiological and Statistical Data Exploration System (CARESS) to support multidimensional analysis of health data. The system is based on an architecture that focuses on extensive interoperability between a database management system and several analysis and visualisation tools. As spatial and statistical aspects of the data play an important role, CARESS provides special support for the integration of both.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"86 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114619114","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}