{"title":"Towards building a MetaQuerier: extracting and matching Web query interfaces","authors":"Bin He, Zhen Zhang, K. Chang","doi":"10.1109/ICDE.2005.145","DOIUrl":null,"url":null,"abstract":"We witness the rapid growth and thus the prevalence of databases on the Web. Our recent study in April 2004 estimated 450,000 online databases. On this deep Web, myriad databases provide dynamic query-based data access through their query interfaces, instead of static URL links. It is thus essential to integrate these query interfaces for integrating the deep Web. The overall goal of the MetaQuerier project aims at opening up the deep Web to users, by building a system to help users exploring and integrating deep Web sources. In particular, to start with, we focus on the integration of deep Web sources in the same domain, which is itself an important integration task. To automate this integration scenario, we need to solve two critical problems: extracting query interfaces and matching query interfaces. To solve the interface extraction problem, we introduce a parsing paradigm by hypothesizing the existence of hidden syntax which describes the layout and semantic of Web interfaces. Also, unlike traditional pairwise schema matching, we propose a holistic matching approach, which matches all schemas at the same time with the hypothesis of a hidden schema model. Therefore, our techniques explore, in essence, \"data mining for information integration.\" That is, we mine the observable information to discover the underlying semantics.","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"303 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Data Engineering (ICDE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2005.145","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
We witness the rapid growth and thus the prevalence of databases on the Web. Our recent study in April 2004 estimated 450,000 online databases. On this deep Web, myriad databases provide dynamic query-based data access through their query interfaces, instead of static URL links. It is thus essential to integrate these query interfaces for integrating the deep Web. The overall goal of the MetaQuerier project aims at opening up the deep Web to users, by building a system to help users exploring and integrating deep Web sources. In particular, to start with, we focus on the integration of deep Web sources in the same domain, which is itself an important integration task. To automate this integration scenario, we need to solve two critical problems: extracting query interfaces and matching query interfaces. To solve the interface extraction problem, we introduce a parsing paradigm by hypothesizing the existence of hidden syntax which describes the layout and semantic of Web interfaces. Also, unlike traditional pairwise schema matching, we propose a holistic matching approach, which matches all schemas at the same time with the hypothesis of a hidden schema model. Therefore, our techniques explore, in essence, "data mining for information integration." That is, we mine the observable information to discover the underlying semantics.