J. Cushing, J. Laird, E. Pasalic, E. Kutter, T. Hunkapiller, F. Zucker, D. Yee
{"title":"Beyond interoperability-tracking and managing the results of computational applications","authors":"J. Cushing, J. Laird, E. Pasalic, E. Kutter, T. Hunkapiller, F. Zucker, D. Yee","doi":"10.1109/SSDM.1997.621191","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621191","url":null,"abstract":"Molecular biology applications, like those of other scientific domains, need to store and view large amounts of specialized quantitative information. With the advent of high speed sequencing technology and considerable funding to \"map\" the genomes of key biological organisms, public databases such as GenBank, PDB, EMBL, JIPID, and SwissProt make millions of genetic sequences available to molecular biologists, and industry and university laboratories maintain large databases. The need for common interfaces and query languages to exploit these heterogeneous databases is well documented, and several such systems now exist or are under development. The authors' own work on database and program interoperability in this domain has shown, however, that providing an interface is but a first step towards making these databases fully useful. The system they are developing integrates and trades inputs and results from numerous computational biology programs. It helps researchers organize result items from sequence comparisons into \"clusters\" that can be marked, named, annotated, and manipulated. An alpha version is implemented in Smalltalk. The paper describes the scientific problem the system aims to solve, as well as current barriers to development and research opportunities suggested by those barriers. They present its conceptual data model, the current prototype, and future implementation plans.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121459124","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A simple structure for statistical meta-data","authors":"A. Westlake","doi":"10.1109/SSDM.1997.621188","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621188","url":null,"abstract":"The paper is a contribution to the debate about the nature of meta-data. The author argues that meta-data is not just data because its effective use requires functionality which is not usually present in an RDBMS. He presents a simple data structure for storing statistical meta-data and discusses the functionality needed for statistical uses. However, because meta-data is data one can use standard RDBMS facilities as well, to increase the usefulness of the meta-data beyond the basic requirements.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131860094","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Developing and accessing scientific databases with the Object-Protocol Model (OPM) data management tools","authors":"I. Chen, A. Kosky, V. Markowitz, E. Szeto","doi":"10.1109/SSDM.1997.621167","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621167","url":null,"abstract":"The Object-Protocol Model (OPM) data management tools provide facilities for rapid development, documentation, and flexible exploration of scientific databases. The tools are based on OPM, an object oriented data model which is similar to the ODMG standard, but also supports extensions for modeling scientific data (L.A. Chen and V.M. Markowitz, 1995). Databases designed using OPM can be implemented using a variety of commercial relational DBMSs, using schema translation tools that generate complete DBMS database definitions from OPM schemas (L.A. Chen and V.M. Markowitz, 1996). Further OPM schemas can be retrofitted on top of existing databases defined using a variety of notations, such as the relational data model or the ASN.1 data exchange format, using OPM retrofitting tools (L.A. Chen et al., 1997).","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117044464","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Security problems for statistical databases with general cell suppressions","authors":"T. Hsu, M. Kao","doi":"10.1109/SSDM.1997.621180","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621180","url":null,"abstract":"Studies statistical database problems for 2D tables whose regular cells, row sums, column sums and table sums may be suppressed. Using graph-theoretical techniques, we give optimal or efficient algorithms for the query system problem, the adversary problem and the minimum complementary suppression problem. These three problems are considered for a variety of data security requirements such as those of protecting linear invariants, analytic invariants, k rows (or columns) as a whole, and a table as a whole.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"48 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121544831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The S-PLUS DataBlade for INFORMIX-Universal Server. The natural wedding of an object relational database with an object-oriented data analysis engine","authors":"R. D. Martin, V. Chalana","doi":"10.1109/SSDM.1997.621173","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621173","url":null,"abstract":"The S-PLUS DataBlade module for the INFORMIX-Universal Server (IUS) combines the strength of a powerful and extensible object-relational database management system with the powerful data analysis, modeling and visualization capabilities of the object-oriented S-PLUS environment. We perform S-PLUS data analysis on data stored in IUS using SQL, the industry-standard query language. The S-PLUS DataBlade module allows S-PLUS expressions and commands to be embedded within SQL expressions, and to conveniently pass data between IUS and S-PLUS. This paper describes the architecture and the capabilities of the S-PLUS DataBlade module.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114569746","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
T. Hinke, J. Rushing, Shalini Kansal, S. Graves, H. Ranganath
{"title":"For scientific data discovery: why can't the archive be more like the Web?","authors":"T. Hinke, J. Rushing, Shalini Kansal, S. Graves, H. Ranganath","doi":"10.1109/SSDM.1997.621160","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621160","url":null,"abstract":"The paper addresses the problem of acquiring from scientific data, metadata that is descriptive of the actual content of the data. Scientists can use this content based metadata in subsequent archive searches to find data sets of interest. Such metadata would be especially useful in large scientific archives such as NASA's Earth Observing System Data and Information System (EOSDIS). The paper presents two generic approaches for content based metadata acquisition: target dependent and target independent. Both of these approaches are oriented toward characterizing datasets in terms of the scientific phenomena, such as mesoscale convective systems (severe storms) that they contain. In the target dependent approach, the archived data is mined for particular phenomena of interest and polygons representing the phenomena are stored in a spatial database where they can be used in the data search process. In the target independent approach, data is initially mined for deviations from normal and for trends. This data can then be used for subsequent searches for particular transient phenomena using the deviation data, or for phenomena related to trends. The paper describes results from implementing both of these approaches.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"121 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114659074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Data mining and modeling in scientific databases","authors":"E. Kapetanios, M. Norrie","doi":"10.1109/SSDM.1997.621146","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621146","url":null,"abstract":"In the last few decades, the execution of various scientific experiments aimed at a more comprehensive understanding of one's environment, has shown a tremendous increase in data production. Database models provide a more or less adequate mechanism for mapping real-world applications into a computer-bound reality. Since scientific knowledge can be modelled a priori only to some extent, the question arises of how able a database schema is to evolve. On the other hand, knowledge can be provided by the underlying scientific data on which data mining algorithms are applied. The main question which arises is how to provide a suitable environment in order to accommodate the results coming out from data analysis tasks and how these tasks can be supported by a database model.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134638527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Constructing and maintaining scientific database views in the framework of the object-protocol model","authors":"I. Chen, A. Kosky, V. Markowitz, E. Szeto","doi":"10.1109/SSDM.1997.621192","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621192","url":null,"abstract":"Scientific databases (ScDBs) are used to archive and retrieve data describing objects of scientific inquiry. Since these ScDBs must provide continuous and efficient access to large communities of scientists, they are often developed with reliable commercial relational database management systems (DBMSs) or file systems. However, relational DBMSs and flat files do not provide constructs for representing directly ScDB-specific objects and experimental procedures, and therefore they are often hard to develop, maintain, and explore. The authors present a retrofitting tool for constructing and maintaining ScDB views using an object-oriented data model, and describe their experience with retrofitting ScDBs that have been originally developed using relational DBMSs and file systems. The retrofitting tool is part of a data management toolkit based on the object-protocol model (OPM). The OPM toolkit provides facilities for developing databases defined using OPM and for querying and browsing such ScDBs in terms of OPM constructs. The OPM retrofitting tool allows constructing (one or several) OPM views for ScDBs that have not been originally developed with the OPM tools. ScDBs with native OPM schemas or retrofitted OPM views can be browsed and queried via OPM interfaces, reorganized, or incorporated into an OPM-based database federation.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134437481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Kuo, M. Winslett, Ying Chen, Yong Cho, M. Subramaniam, K. Seamons
{"title":"Parallel input/output with heterogeneous disks","authors":"S. Kuo, M. Winslett, Ying Chen, Yong Cho, M. Subramaniam, K. Seamons","doi":"10.1109/SSDM.1997.621154","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621154","url":null,"abstract":"Panda is a high performance library for accessing large multidimensional array data on secondary storage of parallel platforms and networks of workstations. When using Panda as the I/O component of a scientific application, H3expresso, on the IBM SP2 at Cornell Theory Center, we found that some nodes are more powerful with respect to I/O than others, requiring the introduction of load balancing techniques to maintain high performance. We expect that heterogeneity will also be a big issue for DBMSs or parallel I/O libraries designed for scientific applications running on networks of workstations, and the methods of allocating data to servers in these environments will need to be upgraded to take heterogeneity into account, while still allowing users to exert control over data layout. We propose such an approach to load balancing, under which we respect the user's choice of high level disk layout, but introduce automatic subchunking. The use of subchunks allows us to divide the very large chunks typically specified by the user's disk layout into more manageable size units that can be allocated to I/O nodes in a manner that fairly distributes the load. We also present two techniques for allocating subchunks to nodes, static and dynamic, and evaluate their performance on the SP2.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116678149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Large-sample and deterministic confidence intervals for online aggregation","authors":"P. Haas","doi":"10.1109/SSDM.1997.621151","DOIUrl":"https://doi.org/10.1109/SSDM.1997.621151","url":null,"abstract":"The online aggregation system recently proposed by J.M. Hellerstein, et al. (1997) permits interactive exploration of large, complex datasets stored in relational database management systems. Running confidence intervals are an important component of an online aggregation system and indicate to the user the estimated proximity of each running aggregate to the corresponding final result. Large sample confidence intervals contain the final result with a prespecified probability and rest on central limit theorems, while deterministic confidence intervals contain the final query result with probability 1. We show how new and existing central limit theorems, simple bounding arguments, and the delta method can be used to derive formulas for both large sample and deterministic confidence intervals. To illustrate these techniques, we obtain formulas for running confidence intervals in the case of single table and multi table AVG, COUNT, SUM, VARIANCE, and STDEV queries with join and selection predicates. Duplicate elimination and GROUP-BY operations are also considered. We then provide numerically stable algorithms for computing the confidence intervals and analyzing the complexity of these algorithms.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127198454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}