{"title":"使用Xtractorz实现web数据提取和Mashup","authors":"R. A. Gultom, R. F. Sari, B. Budiardjo","doi":"10.1109/IADCC.2010.5422921","DOIUrl":null,"url":null,"abstract":"Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.","PeriodicalId":249763,"journal":{"name":"2010 IEEE 2nd International Advance Computing Conference (IACC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":"{\"title\":\"Implementing web data extraction and making Mashup with Xtractorz\",\"authors\":\"R. A. Gultom, R. F. Sari, B. Budiardjo\",\"doi\":\"10.1109/IADCC.2010.5422921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.\",\"PeriodicalId\":249763,\"journal\":{\"name\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"volume\":\"45 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"15\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IADCC.2010.5422921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 2nd International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IADCC.2010.5422921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementing web data extraction and making Mashup with Xtractorz
Implementing web data extraction means we can directly extract data from various web pages, where they mostly formed in an unstructured HTML format, into a new structured format such as XML or XHTML. In this paper we review the implementation of web data extraction and stages in making a Mashup. We implement web data extraction by visually extract targeted data from data sources (web pages). Afterward, we combined web data extraction with the stages of making a Mashup, e.g. data retrieval, data source modeling, data cleaning/ filtering, data integration and data visualization. Problems arise in querying data sources due to unstructured contents of web pages (HTML), we cannot directly extract data into a new structured form. To address this problem, we propose a system, called Xtractorz, that can perform web data extraction in a Mashup format. We provide a fully visual and interactive user interface with new technique and approach using PHP and AJAX as the programming languages, and MySQL as the Data Repository. Furthermore, Xtractorz enables the user to conduct their job without the need to write a script or program or even without any knowledge of computer programming. The test results shows that Xtractorz requires less number of steps in making a Mashup compared with RoboMaker and Karma.