用于web数据提取的有监督的可视化包装器生成器

Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003 Pub Date : 2003-11-03 DOI:10.1109/CMPSAC.2003.1245412

Xiaofeng Meng, Haiyan Wang, Dongdong Hu, Chen Li

{"title":"用于web数据提取的有监督的可视化包装器生成器","authors":"Xiaofeng Meng, Haiyan Wang, Dongdong Hu, Chen Li","doi":"10.1109/CMPSAC.2003.1245412","DOIUrl":null,"url":null,"abstract":"Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper, we propose a novel schema-guided approach to wrapper generation. We provide a user-friendly interface that allows users to define the schema of the data to be extracted, and specifies mappings from a HTML page to the target schema. Based on the mappings, the system can automatically generate an extraction rule to extract data from the page. Our approach to wrapper generation can significantly reduce the work of human beings in this process. And the user never has to deal with the internal extraction rule, or even familiarity with the details of HTML.","PeriodicalId":173397,"journal":{"name":"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A supervised visual wrapper generator for Web-data extraction\",\"authors\":\"Xiaofeng Meng, Haiyan Wang, Dongdong Hu, Chen Li\",\"doi\":\"10.1109/CMPSAC.2003.1245412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper, we propose a novel schema-guided approach to wrapper generation. We provide a user-friendly interface that allows users to define the schema of the data to be extracted, and specifies mappings from a HTML page to the target schema. Based on the mappings, the system can automatically generate an extraction rule to extract data from the page. Our approach to wrapper generation can significantly reduce the work of human beings in this process. And the user never has to deal with the internal extraction rule, or even familiarity with the details of HTML.\",\"PeriodicalId\":173397,\"journal\":{\"name\":\"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CMPSAC.2003.1245412\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2003.1245412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

使用包装器从Web页面中提取数据是在大量实际应用程序中出现的一个基本问题。在本文中，我们提出了一种新的模式引导的包装器生成方法。我们提供了一个用户友好的界面，允许用户定义要提取的数据的模式，并指定从HTML页面到目标模式的映射。根据这些映射，系统可以自动生成提取规则，从页面中提取数据。我们的包装生成方法可以显著减少人类在这个过程中的工作。用户不需要处理内部提取规则，甚至不需要熟悉HTML的细节。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A supervised visual wrapper generator for Web-data extraction

Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper, we propose a novel schema-guided approach to wrapper generation. We provide a user-friendly interface that allows users to define the schema of the data to be extracted, and specifies mappings from a HTML page to the target schema. Based on the mappings, the system can automatically generate an extraction rule to extract data from the page. Our approach to wrapper generation can significantly reduce the work of human beings in this process. And the user never has to deal with the internal extraction rule, or even familiarity with the details of HTML.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003

自引率

0.00%

发文量