{"title":"学习从网络中提取信息的逻辑包装器","authors":"C. Bǎdicǎ, E. Popescu, A. Bădică","doi":"10.1109/SAINTW.2005.77","DOIUrl":null,"url":null,"abstract":"This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The methodology is applied to information extraction from real-world HTML page sets that represent product information sheets, an important task in product data integration. The methodology addresses the problems of defining information extraction rules in the form of logic wrappers and mapping the task of learning these rules to general purpose first-order inductive learning.","PeriodicalId":220913,"journal":{"name":"2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Learning Logic Wrappers for Information Extraction from the Web\",\"authors\":\"C. Bǎdicǎ, E. Popescu, A. Bădică\",\"doi\":\"10.1109/SAINTW.2005.77\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The methodology is applied to information extraction from real-world HTML page sets that represent product information sheets, an important task in product data integration. The methodology addresses the problems of defining information extraction rules in the form of logic wrappers and mapping the task of learning these rules to general purpose first-order inductive learning.\",\"PeriodicalId\":220913,\"journal\":{\"name\":\"2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops)\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2005-01-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SAINTW.2005.77\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2005 Symposium on Applications and the Internet Workshops (SAINT 2005 Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SAINTW.2005.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Logic Wrappers for Information Extraction from the Web
This paper discusses a methodology for applying general-purpose first-order inductive learning to extract information from Web documents structured as unranked ordered trees. The methodology is applied to information extraction from real-world HTML page sets that represent product information sheets, an important task in product data integration. The methodology addresses the problems of defining information extraction rules in the form of logic wrappers and mapping the task of learning these rules to general purpose first-order inductive learning.