{"title":"Effective Schema Extraction of Query Interfaces on the Deep Web","authors":"Bao-hua Qiang, Jian-qing Xi, Ling Chen","doi":"10.1109/FSKD.2008.135","DOIUrl":null,"url":null,"abstract":"The Deep Web is becoming a very important information resource. Unlike the traditional Web information retrieval, the contents on the Deep Web are only accessible through source query interfaces. However, for any domain of interest, there may be so many query interfaces that users need to access them in order to get the desired information, which is time-consuming and requires to build an integrated query interface over the sources. The first important task towards this goal is schema extraction of source query interface. In this paper, we will present a novel pre-clustering algorithm with proper grouping patterns to obtain partial clustering of attributes. Our approach can avoid obtaining the incorrect subsets when grouping attributes. The experimental results showed our approach is highly effective on schema extraction of source query interfaces on the Deep Web.","PeriodicalId":208332,"journal":{"name":"2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2008.135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
The Deep Web is becoming a very important information resource. Unlike the traditional Web information retrieval, the contents on the Deep Web are only accessible through source query interfaces. However, for any domain of interest, there may be so many query interfaces that users need to access them in order to get the desired information, which is time-consuming and requires to build an integrated query interface over the sources. The first important task towards this goal is schema extraction of source query interface. In this paper, we will present a novel pre-clustering algorithm with proper grouping patterns to obtain partial clustering of attributes. Our approach can avoid obtaining the incorrect subsets when grouping attributes. The experimental results showed our approach is highly effective on schema extraction of source query interfaces on the Deep Web.