Natsumi Sawa, Atsuyuki Morishima, S. Sugimoto, H. Kitagawa
{"title":"Wraplet: Wrapping Your Web Contents with a Lightweight Language","authors":"Natsumi Sawa, Atsuyuki Morishima, S. Sugimoto, H. Kitagawa","doi":"10.1109/SITIS.2007.135","DOIUrl":null,"url":null,"abstract":"Wrapping of Web sources is known to be one of the key tasks in information integration problems. This paper proposes Wraplet, a wrapping language for extracting structured data from Web contents written in HTML. Unlike existing solutions, Wraplet is designed as a lightweight language in which users can write scripts for wrapping easily with text editors. Its simple syntax and the library of useful patterns help the user write wrapping descriptions by hand. We explain the motivation of its development and the language design and then shows the result of a preliminary experiment about applicability of the language to real Web sources. We conducted a statistical analysis and obtained the result that the applicability of Wraplet is more than 90% at the 95% confidence level in the experimental setting.","PeriodicalId":234433,"journal":{"name":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 Third International IEEE Conference on Signal-Image Technologies and Internet-Based System","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SITIS.2007.135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Wrapping of Web sources is known to be one of the key tasks in information integration problems. This paper proposes Wraplet, a wrapping language for extracting structured data from Web contents written in HTML. Unlike existing solutions, Wraplet is designed as a lightweight language in which users can write scripts for wrapping easily with text editors. Its simple syntax and the library of useful patterns help the user write wrapping descriptions by hand. We explain the motivation of its development and the language design and then shows the result of a preliminary experiment about applicability of the language to real Web sources. We conducted a statistical analysis and obtained the result that the applicability of Wraplet is more than 90% at the 95% confidence level in the experimental setting.