{"title":"A Web data extraction description language and its implementation","authors":"I-Chen Wu, J. Su, Loon-Been Chen","doi":"10.1109/COMPSAC.2005.38","DOIUrl":null,"url":null,"abstract":"A data extraction model, named the browser-oriented data extraction (BODE) model, was proposed by I-Chen Wu et al. (2005) to extract Web contents with script functions. In this model, the system built on top of browsers accesses pages by simulating users' operations on browsers. Based on this model, this paper defines a scripting language, named the BODED (browser-oriented data extraction description) language, which instructs the system how to do data extraction. This paper proposes a technique, called indirect browser replication to implement a BODE system, and also optimize the performance of this technique.","PeriodicalId":419267,"journal":{"name":"29th Annual International Computer Software and Applications Conference (COMPSAC'05)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"29th Annual International Computer Software and Applications Conference (COMPSAC'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/COMPSAC.2005.38","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
A data extraction model, named the browser-oriented data extraction (BODE) model, was proposed by I-Chen Wu et al. (2005) to extract Web contents with script functions. In this model, the system built on top of browsers accesses pages by simulating users' operations on browsers. Based on this model, this paper defines a scripting language, named the BODED (browser-oriented data extraction description) language, which instructs the system how to do data extraction. This paper proposes a technique, called indirect browser replication to implement a BODE system, and also optimize the performance of this technique.