使用Xtractorz实现web数据提取和Mashup

2010 IEEE 2nd International Advance Computing Conference (IACC) Pub Date : 2010-03-01 DOI:10.1109/IADCC.2010.5422921

R. A. Gultom, R. F. Sari, B. Budiardjo

{"title":"使用Xtractorz实现web数据提取和Mashup","authors":"R. A. Gultom, R. F. Sari, B. Budiardjo","doi":"10.1109/IADCC.2010.5422921","DOIUrl":null,"url":null,"abstract":"Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.","PeriodicalId":249763,"journal":{"name":"2010 IEEE 2nd International Advance Computing Conference (IACC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Implementing web data extraction and making Mashup with Xtractorz\",\"authors\":\"R. A. Gultom, R. F. Sari, B. Budiardjo\",\"doi\":\"10.1109/IADCC.2010.5422921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.\",\"PeriodicalId\":249763,\"journal\":{\"name\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IADCC.2010.5422921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 2nd International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IADCC.2010.5422921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 15

摘要

实现web数据提取意味着我们可以直接从各种网页中提取数据，这些数据大多以非结构化的HTML格式形成，变成新的结构化格式，如XML或XHTML。在本文中，我们回顾了web数据提取的实现和制作Mashup的各个阶段。我们通过可视化地从数据源(网页)中提取目标数据来实现web数据提取。之后，我们将web数据提取与Mashup的各个阶段结合起来，例如数据检索、数据源建模、数据清洗/过滤、数据集成和数据可视化。由于网页(HTML)的非结构化内容，在查询数据源时会出现问题，我们无法直接将数据提取为新的结构化形式。为了解决这个问题，我们提出了一个名为Xtractorz的系统，它可以以Mashup格式执行web数据提取。我们使用PHP和AJAX作为编程语言，MySQL作为数据存储库，采用新技术和方法提供了一个完全可视化和交互式的用户界面。此外，Xtractorz使用户无需编写脚本或程序，甚至不需要任何计算机编程知识就可以完成他们的工作。测试结果表明，与RoboMaker和Karma相比，Xtractorz制作Mashup所需的步骤更少。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Implementing web data extraction and making Mashup with Xtractorz

Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE 2nd International Advance Computing Conference (IACC)

自引率

0.00%

发文量