Enabling information integration and workflows in a grid environment with automatic wrapper generation

The 6th IEEE/ACM International Workshop on Grid Computing, 2005. Pub Date : 2005-11-13 DOI:10.1109/GRID.2005.1542737

Xuan Zhang, G. Agrawal

{"title":"Enabling information integration and workflows in a grid environment with automatic wrapper generation","authors":"Xuan Zhang, G. Agrawal","doi":"10.1109/GRID.2005.1542737","DOIUrl":null,"url":null,"abstract":"With a growing trend towards grid-based data repositories and data analysis services, scientific data analysis often involves accessing multiple data sources, and analyzing the data using a variety of analysis programs. One critical challenge in this, however, is that data sources often hold the same type of data in a number of different formats, and also, the formats expected and generated by various data analysis services are often distinct. We believe that the traditional approach for dealing with this problem, which is using hand-written wrappers, is not an effective and scalable solution for a grid environment. This paper presents a new approach, which involves generating wrappers automatically for enabling grid-based information integration and workflows. In this approach, a layout descriptor is used for describing the data format for each data source, as well as the input and output format for each tool or service. Efficient wrappers are then generated automatically for translation between any two data formats. Our design separates wrapper generation service from the wrapper execution. The wrapper generation service analyzes the layout descriptors and generates a WRAPINFO data structure. The wrapper comprises a set of application independent modules which take the WRAPINFO data structure as the input. We demonstrate our wrapper generation tool with two real case studies. Besides showing the effectiveness of our system, the experiments results from these two case studies show that the wrapper generation overhead is very small, automatically generated wrappers scale well to large datasets, and for the one case where this comparison was possible, the execution time of our wrapper was within 30% of that of a hand-written one.","PeriodicalId":347929,"journal":{"name":"The 6th IEEE/ACM International Workshop on Grid Computing, 2005.","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 6th IEEE/ACM International Workshop on Grid Computing, 2005.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/GRID.2005.1542737","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

With a growing trend towards grid-based data repositories and data analysis services, scientific data analysis often involves accessing multiple data sources, and analyzing the data using a variety of analysis programs. One critical challenge in this, however, is that data sources often hold the same type of data in a number of different formats, and also, the formats expected and generated by various data analysis services are often distinct. We believe that the traditional approach for dealing with this problem, which is using hand-written wrappers, is not an effective and scalable solution for a grid environment. This paper presents a new approach, which involves generating wrappers automatically for enabling grid-based information integration and workflows. In this approach, a layout descriptor is used for describing the data format for each data source, as well as the input and output format for each tool or service. Efficient wrappers are then generated automatically for translation between any two data formats. Our design separates wrapper generation service from the wrapper execution. The wrapper generation service analyzes the layout descriptors and generates a WRAPINFO data structure. The wrapper comprises a set of application independent modules which take the WRAPINFO data structure as the input. We demonstrate our wrapper generation tool with two real case studies. Besides showing the effectiveness of our system, the experiments results from these two case studies show that the wrapper generation overhead is very small, automatically generated wrappers scale well to large datasets, and for the one case where this comparison was possible, the execution time of our wrapper was within 30% of that of a hand-written one.

查看原文本刊更多论文

通过自动生成包装器，在网格环境中支持信息集成和工作流

随着基于网格的数据存储和数据分析服务的发展趋势，科学数据分析通常涉及访问多个数据源，并使用各种分析程序分析数据。然而，其中的一个关键挑战是，数据源通常以许多不同的格式保存相同类型的数据，而且，各种数据分析服务期望和生成的格式通常是不同的。我们认为，处理这个问题的传统方法(使用手写包装器)对于网格环境来说不是一个有效的、可扩展的解决方案。本文提出了一种新的方法，即自动生成包装器以实现基于网格的信息集成和工作流。在这种方法中，使用布局描述符来描述每个数据源的数据格式，以及每个工具或服务的输入和输出格式。然后自动生成有效的包装器，以便在任意两种数据格式之间进行转换。我们的设计将包装器生成服务与包装器执行分离开来。包装器生成服务分析布局描述符并生成WRAPINFO数据结构。包装器由一组独立于应用程序的模块组成，这些模块将WRAPINFO数据结构作为输入。我们通过两个实际案例研究来演示包装器生成工具。除了显示我们的系统的有效性之外，这两个案例研究的实验结果表明，包装器生成开销非常小，自动生成的包装器可以很好地扩展到大型数据集，并且对于可以进行比较的一个案例，我们的包装器的执行时间在手工编写的包装器的执行时间的30%以内。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 6th IEEE/ACM International Workshop on Grid Computing, 2005.

自引率

0.00%

发文量