Graywulf: a platform for federated scientific databases and services

Proceedings of the 25th International Conference on Scientific and Statistical Database Management Pub Date : 2013-07-29 DOI:10.1145/2484838.2484863

L. Dobos, I. Csabai, A. Szalay, T. Budavári, Nolan Li

{"title":"Graywulf: a platform for federated scientific databases and services","authors":"L. Dobos, I. Csabai, A. Szalay, T. Budavári, Nolan Li","doi":"10.1145/2484838.2484863","DOIUrl":null,"url":null,"abstract":"Many fields of science rely on relational database management systems to analyze, publish and share data. Since RDBMS are originally designed for, and their development directions are primarily driven by, business use cases they often lack features very important for scientific applications. Horizontal scalability is probably the most important missing feature which makes it challenging to adapt traditional relational database systems to the ever growing data sizes. Due to the limited support of array data types and metadata management, successful application of RDBMS in science usually requires the development of custom extensions. While some of these extensions are specific to the field of science, the majority of them could easily be generalized and reused in other disciplines. With the Graywulf project we intend to target several goals. We are building a generic platform that offers reusable components for efficient storage, transformation, statistical analysis and presentation of scientific data stored in Microsoft SQL Server. Graywulf also addresses the distributed computational issues arising from current RDBMS technologies. The current version supports load balancing of simple queries and parallel execution of partitioned queries over a set of mirrored databases. Uniform user access to the data is provided through a web based query interface and a data surface for software clients. Queries are formulated in a slightly modified syntax of SQL that offers a transparent view of the distributed data. The software library consists of several components that can be reused to develop complex scientific data warehouses: a system registry, administration tools to manage entire database server clusters, a sophisticated workflow execution framework, and a SQL parser library.","PeriodicalId":269347,"journal":{"name":"Proceedings of the 25th International Conference on Scientific and Statistical Database Management","volume":"230 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Many fields of science rely on relational database management systems to analyze, publish and share data. Since RDBMS are originally designed for, and their development directions are primarily driven by, business use cases they often lack features very important for scientific applications. Horizontal scalability is probably the most important missing feature which makes it challenging to adapt traditional relational database systems to the ever growing data sizes. Due to the limited support of array data types and metadata management, successful application of RDBMS in science usually requires the development of custom extensions. While some of these extensions are specific to the field of science, the majority of them could easily be generalized and reused in other disciplines. With the Graywulf project we intend to target several goals. We are building a generic platform that offers reusable components for efficient storage, transformation, statistical analysis and presentation of scientific data stored in Microsoft SQL Server. Graywulf also addresses the distributed computational issues arising from current RDBMS technologies. The current version supports load balancing of simple queries and parallel execution of partitioned queries over a set of mirrored databases. Uniform user access to the data is provided through a web based query interface and a data surface for software clients. Queries are formulated in a slightly modified syntax of SQL that offers a transparent view of the distributed data. The software library consists of several components that can be reused to develop complex scientific data warehouses: a system registry, administration tools to manage entire database server clusters, a sophisticated workflow execution framework, and a SQL parser library.

查看原文本刊更多论文

一个联合科学数据库和服务的平台

许多科学领域依赖于关系数据库管理系统来分析、发布和共享数据。由于RDBMS最初是为业务用例设计的，其开发方向主要是由业务用例驱动的，因此它们通常缺乏对科学应用程序非常重要的特性。水平可伸缩性可能是最重要的缺失特性，这使得传统关系数据库系统难以适应不断增长的数据大小。由于对数组数据类型和元数据管理的支持有限，RDBMS在科学领域的成功应用通常需要开发自定义扩展。虽然其中一些扩展是特定于科学领域的，但它们中的大多数可以很容易地在其他学科中推广和重用。在格雷沃夫项目中，我们打算实现几个目标。我们正在构建一个通用平台，为存储在Microsoft SQL Server中的科学数据提供可重用组件，用于高效存储、转换、统计分析和呈现。Graywulf还讨论了由当前RDBMS技术引起的分布式计算问题。当前版本支持简单查询的负载平衡和在一组镜像数据库上并行执行分区查询。通过基于web的查询接口和面向软件客户端的数据界面，为用户提供对数据的统一访问。查询是用稍微修改过的SQL语法制定的，它提供了分布式数据的透明视图。软件库由几个组件组成，这些组件可以被重用来开发复杂的科学数据仓库:系统注册表、管理整个数据库服务器集群的管理工具、复杂的工作流执行框架和SQL解析器库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 25th International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量