WebView:一个检索内部结构和从HTML文档中提取信息的工具

S. Lim, Yiu-Kai Ng
{"title":"WebView:一个检索内部结构和从HTML文档中提取信息的工具","authors":"S. Lim, Yiu-Kai Ng","doi":"10.1109/DASFAA.1999.765738","DOIUrl":null,"url":null,"abstract":"HTML is a well-accepted and widely used language for creating platform-independent documents to be posted on the Web, and HTML documents are semistructured in nature according to the HTML specification. We propose a tool, called WebView, which constructs the semistructured data graph (SDG) of an HTML document H to capture the internal structure of data embedded in H and in its (in)directly linked documents. On top of the SDG, WebView provides query processing capability for evaluating SQL-like queries that are posted against the SDG, i.e., the source document(s), for extracting information from the SDG. Existing methods for extracting structured information from certain HTML documents with static internal structure, such as wrappers and integrators for data warehousing, can benefit from WebView.","PeriodicalId":229416,"journal":{"name":"Proceedings. 6th International Conference on Advanced Systems for Advanced Applications","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"WebView: a tool for retrieving internal structures and extracting information from HTML documents\",\"authors\":\"S. Lim, Yiu-Kai Ng\",\"doi\":\"10.1109/DASFAA.1999.765738\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"HTML is a well-accepted and widely used language for creating platform-independent documents to be posted on the Web, and HTML documents are semistructured in nature according to the HTML specification. We propose a tool, called WebView, which constructs the semistructured data graph (SDG) of an HTML document H to capture the internal structure of data embedded in H and in its (in)directly linked documents. On top of the SDG, WebView provides query processing capability for evaluating SQL-like queries that are posted against the SDG, i.e., the source document(s), for extracting information from the SDG. Existing methods for extracting structured information from certain HTML documents with static internal structure, such as wrappers and integrators for data warehousing, can benefit from WebView.\",\"PeriodicalId\":229416,\"journal\":{\"name\":\"Proceedings. 6th International Conference on Advanced Systems for Advanced Applications\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. 6th International Conference on Advanced Systems for Advanced Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DASFAA.1999.765738\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 6th International Conference on Advanced Systems for Advanced Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DASFAA.1999.765738","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

摘要

HTML是一种广为接受和广泛使用的语言,用于创建要发布到Web上的独立于平台的文档,并且根据HTML规范,HTML文档本质上是半结构化的。我们提出了一个名为WebView的工具,它构建HTML文档H的半结构化数据图(SDG),以捕获嵌入在H中的数据及其直接链接文档中的数据的内部结构。在SDG之上,WebView提供了查询处理功能,用于评估针对SDG(即源文档)发布的类似sql的查询,以便从SDG中提取信息。从具有静态内部结构的某些HTML文档中提取结构化信息的现有方法(例如用于数据仓库的包装器和集成器)可以受益于WebView。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
WebView: a tool for retrieving internal structures and extracting information from HTML documents
HTML is a well-accepted and widely used language for creating platform-independent documents to be posted on the Web, and HTML documents are semistructured in nature according to the HTML specification. We propose a tool, called WebView, which constructs the semistructured data graph (SDG) of an HTML document H to capture the internal structure of data embedded in H and in its (in)directly linked documents. On top of the SDG, WebView provides query processing capability for evaluating SQL-like queries that are posted against the SDG, i.e., the source document(s), for extracting information from the SDG. Existing methods for extracting structured information from certain HTML documents with static internal structure, such as wrappers and integrators for data warehousing, can benefit from WebView.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信