{"title":"WICCAP:从半结构化数据到结构化数据","authors":"Zhao Li, W. Ng","doi":"10.1109/ECBS.2004.1316686","DOIUrl":null,"url":null,"abstract":"Web data extraction is a technique for extracting and integrating data from Web based semistructured data. Wrappers function like the kernel of Web data extraction systems providing information mediator between users and a large number of heterogeneous data sources. Typically, they process semistructured documents generated from structured databases based on rules that are usually hidden to users. Much research has been done to use various methods to represent the knowledge of hidden rules and exploit techniques such as grammar induction, inductive logic programming, etc., to discover these rules that can be used by wrappers to extract data. An important property of semistructured data is its hierarchical structure. Intuitively, we can devise a method that can use this structure information to generate wrappers. We describe a Web data extraction system - WICCAP and its internal Web Data Extraction Language (WDEL) that provides unified view of Web data resources and extracted data. We describe some rule generation features of WICCAP and provide detailed description of the internal language and its implementation. We have conducted experiments to show the ease on generating wrappers with this approach.","PeriodicalId":137219,"journal":{"name":"Proceedings. 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems, 2004.","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":"{\"title\":\"WICCAP: from semi-structured data to structured data\",\"authors\":\"Zhao Li, W. Ng\",\"doi\":\"10.1109/ECBS.2004.1316686\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web data extraction is a technique for extracting and integrating data from Web based semistructured data. Wrappers function like the kernel of Web data extraction systems providing information mediator between users and a large number of heterogeneous data sources. Typically, they process semistructured documents generated from structured databases based on rules that are usually hidden to users. Much research has been done to use various methods to represent the knowledge of hidden rules and exploit techniques such as grammar induction, inductive logic programming, etc., to discover these rules that can be used by wrappers to extract data. An important property of semistructured data is its hierarchical structure. Intuitively, we can devise a method that can use this structure information to generate wrappers. We describe a Web data extraction system - WICCAP and its internal Web Data Extraction Language (WDEL) that provides unified view of Web data resources and extracted data. We describe some rule generation features of WICCAP and provide detailed description of the internal language and its implementation. We have conducted experiments to show the ease on generating wrappers with this approach.\",\"PeriodicalId\":137219,\"journal\":{\"name\":\"Proceedings. 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems, 2004.\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2004-05-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"16\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems, 2004.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ECBS.2004.1316686\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems, 2004.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ECBS.2004.1316686","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
WICCAP: from semi-structured data to structured data
Web data extraction is a technique for extracting and integrating data from Web based semistructured data. Wrappers function like the kernel of Web data extraction systems providing information mediator between users and a large number of heterogeneous data sources. Typically, they process semistructured documents generated from structured databases based on rules that are usually hidden to users. Much research has been done to use various methods to represent the knowledge of hidden rules and exploit techniques such as grammar induction, inductive logic programming, etc., to discover these rules that can be used by wrappers to extract data. An important property of semistructured data is its hierarchical structure. Intuitively, we can devise a method that can use this structure information to generate wrappers. We describe a Web data extraction system - WICCAP and its internal Web Data Extraction Language (WDEL) that provides unified view of Web data resources and extracted data. We describe some rule generation features of WICCAP and provide detailed description of the internal language and its implementation. We have conducted experiments to show the ease on generating wrappers with this approach.