{"title":"用于web数据提取的有监督的可视化包装器生成器","authors":"Xiaofeng Meng, Haiyan Wang, Dongdong Hu, Chen Li","doi":"10.1109/CMPSAC.2003.1245412","DOIUrl":null,"url":null,"abstract":"Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper, we propose a novel schema-guided approach to wrapper generation. We provide a user-friendly interface that allows users to define the schema of the data to be extracted, and specifies mappings from a HTML page to the target schema. Based on the mappings, the system can automatically generate an extraction rule to extract data from the page. Our approach to wrapper generation can significantly reduce the work of human beings in this process. And the user never has to deal with the internal extraction rule, or even familiarity with the details of HTML.","PeriodicalId":173397,"journal":{"name":"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003","volume":"115 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"A supervised visual wrapper generator for Web-data extraction\",\"authors\":\"Xiaofeng Meng, Haiyan Wang, Dongdong Hu, Chen Li\",\"doi\":\"10.1109/CMPSAC.2003.1245412\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper, we propose a novel schema-guided approach to wrapper generation. We provide a user-friendly interface that allows users to define the schema of the data to be extracted, and specifies mappings from a HTML page to the target schema. Based on the mappings, the system can automatically generate an extraction rule to extract data from the page. Our approach to wrapper generation can significantly reduce the work of human beings in this process. And the user never has to deal with the internal extraction rule, or even familiarity with the details of HTML.\",\"PeriodicalId\":173397,\"journal\":{\"name\":\"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003\",\"volume\":\"115 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-11-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CMPSAC.2003.1245412\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2003.1245412","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A supervised visual wrapper generator for Web-data extraction
Extracting data from Web pages using wrappers is a fundamental problem arising in a large variety of applications of vast practical interest. In this paper, we propose a novel schema-guided approach to wrapper generation. We provide a user-friendly interface that allows users to define the schema of the data to be extracted, and specifies mappings from a HTML page to the target schema. Based on the mappings, the system can automatically generate an extraction rule to extract data from the page. Our approach to wrapper generation can significantly reduce the work of human beings in this process. And the user never has to deal with the internal extraction rule, or even familiarity with the details of HTML.