{"title":"Semi-automatic wrapper generation for Internet information sources","authors":"N. Ashish, Craig A. Knoblock","doi":"10.1109/COOPIS.1997.613813","DOIUrl":null,"url":null,"abstract":"To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), the authors are building information mediators for extracting and integrating data from multiple Web sources. In a mediator based approach, wrappers are built around individual information sources to translate between the mediator query language and the individual sources. They present an approach for semi-automatically generating wrappers for structured Internet sources. The key idea is to exploit formatting information in Web pages to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. They demonstrate the ease with which they are able to build wrappers for a number of Web sources using their implemented wrapper generation toolkit.","PeriodicalId":293694,"journal":{"name":"Proceedings of CoopIS 97: 2nd IFCIS Conference on Cooperative Information Systems","volume":"6 8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"209","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of CoopIS 97: 2nd IFCIS Conference on Cooperative Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COOPIS.1997.613813","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 209
Abstract
To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), the authors are building information mediators for extracting and integrating data from multiple Web sources. In a mediator based approach, wrappers are built around individual information sources to translate between the mediator query language and the individual sources. They present an approach for semi-automatically generating wrappers for structured Internet sources. The key idea is to exploit formatting information in Web pages to hypothesize the underlying structure of a page. From this structure the system generates a wrapper that facilitates querying of a source and possibly integrating it with other sources. They demonstrate the ease with which they are able to build wrappers for a number of Web sources using their implemented wrapper generation toolkit.