{"title":"Integrating distributed scientific data sources with MOCHA and XRoaster","authors":"M. Rodríguez-Martínez, N. Roussopoulos, J. McGann","doi":"10.1109/SSDM.2001.938560","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938560","url":null,"abstract":"MOCHA is a novel middleware system for integrating distributed data sources that we have developed at the University of Maryland. MOCHA is based on the idea that the code that implements user-defined types and functions should be automatically deployed to remote sites by the middleware system itself. To this end, we have developed an XML-based framework to specify metadata about data sites, data sets, and user-defined types and functions. XRoaster is a graphical tool that we have developed to help the user create all the XML metadata elements to be used in MOCHA.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125432169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Lewis, J. Berlin, Ted Meyer, S. Kruglikov, Steve Miller, IV JohnW.Lyver, R. Gharavi
{"title":"An information system for distillation data farming","authors":"A. Lewis, J. Berlin, Ted Meyer, S. Kruglikov, Steve Miller, IV JohnW.Lyver, R. Gharavi","doi":"10.1109/SSDM.2001.938563","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938563","url":null,"abstract":"Project Albert is the US Marine Corp's program for researching the new sciences and leveraging advances in computing power and data analysis to better understand the effects of non-linearity, intangibles, and co-evolving landscapes on answers to military decision-maker's operational questions. The paper describes the design, architecture, and prototyping of an information system to support Data Farming, an analysis method to process and analyze large numbers of simulations. Data Farming and the prototype system are described including semantics, the database schema, sample queries, prototype client/server software and servlets, query generation software, and a visualization tool.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130703900","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost-based unbalanced R-trees","authors":"K. A. Ross, I. Sitzmann, Peter James Stuckey","doi":"10.1109/SSDM.2001.938552","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938552","url":null,"abstract":"Cost-based unbalanced R-trees (CUR-trees) are a cost-function-based data structure for spatial data. CUR-trees are constructed specifically to improve the evaluation of intersection queries, the most basic selection query in an R-tree. A CUR-tree is built taking into account a given query distribution for the queries and a cost model for their execution. Depending on the expected frequency of access, objects or subtrees are stored higher up in the tree. After each insertion in the tree, local reorganizations of a node and its children have their expected query cost evaluated, and a reorganization is performed if this is beneficial. No strict balancing of the trees applies, allowing the tree to unfold solely based on the result of the cost evaluation. We present our cost-based approach and describe the evaluation and reorganization operations based on the cost function. We present a cost model for in-memory access costs and we present three different query models. In our experiments, we compare the performance of the CUR-tree to the R-tree and the R*-tree. The CUR-tree is able to significantly improve intersection query performance, without unacceptably increasing the cost of building the tree. The use of R-trees for in-memory data reflects the high (and growing) cost of bringing data from RAM into the CPU cache relative to the cost of other computations.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114932117","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Efficient historical R-trees","authors":"Yufei Tao, D. Papadias","doi":"10.1109/SSDM.2001.938554","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938554","url":null,"abstract":"The historical R-tree (HR-tree) is a spatio-temporal access method aimed at the retrieval of window queries in the past. The concept behind the method is to keep an R-tree for each timestamp in history, but to allow consecutive trees to share branches when the underlying objects do not change. New branches are only created to accommodate updates from the previous timestamp. Although existing implementations of HR-trees process timestamp (window) queries very efficiently, they are hardly applicable in practice due to excessive space requirements and poor interval query performance. This paper addresses these problems by proposing the HR+-tree, which occupies a small fraction of the space required for the corresponding HR-tree (for typical conditions about 20%), while improving interval query performance several times. Our claims are supported by extensive experimental evaluation.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"11 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123156950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Earth System Science Workbench: a data management infrastructure for earth science products","authors":"J. Frew, R. Bose","doi":"10.1109/SSDM.2001.938550","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938550","url":null,"abstract":"The Earth System Science Workbench (ESSW) is a non-intrusive data management infrastructure for researchers who are also data publishers. An implementation of ESSW to track the processing of locally received satellite imagery is presented, demonstrating the Workbench's transparent and robust support for archiving and publishing data products. ESSW features a Lab Notebook metadata service, an ND-WORM (No Duplicate-Write Once Read Many) storage service, and Web user interface tools. The Lab Notebook logs processes (experiments) and their relationships via a custom API to XML documents stored in a relational database. The ND-WORM provides a managed storage archive for the Lab Notebook by keeping unique file digests and name-space meta-data, also in a relational database. ESSW Notebook tools allow project searching and ordering, and file and meta-data management.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133281300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using association rules to add or eliminate query constraints automatically","authors":"A. Trigoni, K. Moody","doi":"10.1109/SSDM.2001.938545","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938545","url":null,"abstract":"Much interesting work has been done on the use of semantic associations for optimizing query execution. Our objective is to study the use of association rules to add or eliminate constraints in the where clause of a select query. In particular, we take advantage of the following heuristics presented by Siegel et al. (1992): i) if a selection on attribute A is implied by another selection condition on attribute B and A is not an index attribute, then the selection on A can be removed from the query; ii) if a relation R in the query has a restricted attribute A and an unrestricted cluster index attribute B, then look for a rule where the restriction on A implies a restriction on B. The contribution of our work is twofold. First, we present detailed algorithms that apply these heuristics. Hence, our ideas are easy to implement. Second we discuss conditions under which it is worth applying these optimization techniques, and we show the extent to which they speed up query execution.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"72 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123472651","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OLAP databases and aggregation functions","authors":"H. Lenz, B. Thalheim","doi":"10.1109/SSDM.2001.938542","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938542","url":null,"abstract":"Aggregation functions are a class of generic functions which must be usable in any database application. We characterize the case where the aggregation functions can be correctly applied on macrodata (data cube) which are computed on the microdata.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125030327","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Shahabi, Seokkyung Chung, Maytham Safar, G. Hajj
{"title":"2D TSA-tree: a wavelet-based approach to improve the efficiency of multi-level spatial data mining","authors":"C. Shahabi, Seokkyung Chung, Maytham Safar, G. Hajj","doi":"10.1109/SSDM.2001.938538","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938538","url":null,"abstract":"Due to the large amount of the collected scientific data, it is becoming increasingly difficult for scientists to comprehend and interpret the available data. Moreover typical queries on these data sets are in the nature of identifying (or visualizing) trends and surprises at a selected sub-region in multiple levels of abstraction rather than identifying information about a specific data point. The authors propose a versatile wavelet-based data structure, 2D TSA-tree (Trend and Surprise Abstractions Tree), to enable efficient multi-level trend detection on spatial data at different levels. We show how 2D TSA-tree can be utilized efficiently for sub-region selections. Moreover, 2D TSA-tree can be utilized to precompute the reconstruction error and retrieval time of a data subset in advance in order to allow the user to trade off accuracy for response time (or vice versa) at query time. Finally, when the storage space is limited, our 2D Optimal TSA-tree saves on storage by storing only a specific optimal subset of the tree. To demonstrate the effectiveness of our proposed methods, we evaluated our 2D TSA-tree using real and synthetic data. Our results show that our method outperformed other methods (DFT and SVD) in terms of accuracy, complexity and scalability.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"491 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125670647","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Semistructured probabilistic databases","authors":"Alex Dekhtyar, J. Goldsmith, Sean R. Hawkes","doi":"10.1109/SSDM.2001.938536","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938536","url":null,"abstract":"The article describes a novel theoretical framework for uniform storage and management of diverse probabilistic information. The semistructured data model has gained wide acceptance recently as a means of representing data which lacks a rigid structure of schema. In particular, the similarity of the semistructured data model and the underlying data model for eXtensible Markup Language (XML), the emerging open standard for data storage and transmission over the Internet, make our choice of this approach attractive. The authors present the formal model for semistructured probabilistic objects. They provide the theoretical foundations for storing and managing semistructured probabilistic objects. Previously (S. Hawkes and A. Dekhtyar, 2001), we started the process of translating this model into XML. We introduce the advising application and give formal definitions of semistructured probabilistic objects. Finally, we introduce the underlying algebra for semistructured probabilistic databases.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126394872","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Content-based browsing of data from the Tropical Rainfall Measuring Mission (TRMM)","authors":"O. Kelley, J. Stout, M. Kafatos","doi":"10.1109/SSDM.2001.938562","DOIUrl":"https://doi.org/10.1109/SSDM.2001.938562","url":null,"abstract":"Through content based browsing, the TSDIS Orbit Viewer can help scientists decide which files to order from the TRMM archive. The Orbit Viewer's Mission Index can locate large-scale rain events in six terabytes of data. The Orbit Viewer's TRMM Tracker can locate coincidences between the TRMM orbit and a user-defined surface track.","PeriodicalId":129323,"journal":{"name":"Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2001-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131268692","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}