使用Xtractorz实现web数据提取和Mashup

R. A. Gultom, R. F. Sari, B. Budiardjo
{"title":"使用Xtractorz实现web数据提取和Mashup","authors":"R. A. Gultom, R. F. Sari, B. Budiardjo","doi":"10.1109/IADCC.2010.5422921","DOIUrl":null,"url":null,"abstract":"Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.","PeriodicalId":249763,"journal":{"name":"2010 IEEE 2nd International Advance Computing Conference (IACC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Implementing web data extraction and making Mashup with Xtractorz\",\"authors\":\"R. A. Gultom, R. F. Sari, B. Budiardjo\",\"doi\":\"10.1109/IADCC.2010.5422921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.\",\"PeriodicalId\":249763,\"journal\":{\"name\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IADCC.2010.5422921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 2nd International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IADCC.2010.5422921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

摘要

实现web数据提取意味着我们可以直接从各种网页中提取数据,这些数据大多以非结构化的HTML格式形成,变成新的结构化格式,如XML或XHTML。在本文中,我们回顾了web数据提取的实现和制作Mashup的各个阶段。我们通过可视化地从数据源(网页)中提取目标数据来实现web数据提取。之后,我们将web数据提取与Mashup的各个阶段结合起来,例如数据检索、数据源建模、数据清洗/过滤、数据集成和数据可视化。由于网页(HTML)的非结构化内容,在查询数据源时会出现问题,我们无法直接将数据提取为新的结构化形式。为了解决这个问题,我们提出了一个名为Xtractorz的系统,它可以以Mashup格式执行web数据提取。我们使用PHP和AJAX作为编程语言,MySQL作为数据存储库,采用新技术和方法提供了一个完全可视化和交互式的用户界面。此外,Xtractorz使用户无需编写脚本或程序,甚至不需要任何计算机编程知识就可以完成他们的工作。测试结果表明,与RoboMaker和Karma相比,Xtractorz制作Mashup所需的步骤更少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Implementing web data extraction and making Mashup with Xtractorz
Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信