{"title":"Finding the WDB's Query Interface in Deep Web Automatically","authors":"Peiguang Lin, Ru-zhi Xu, Zhimin Hong, Yan Zhang","doi":"10.1109/ICICSE.2008.77","DOIUrl":null,"url":null,"abstract":"Web search engines work well for finding crawlable pages, but not for finding datasets hidden behind Web search forms. On this deep Web, many sources are structured by providing structured query interfaces and results. Organizing such structured sources into a domain hierarchy that users can browse to find these valuable resources and is one of the critical steps toward the large-scale integration of heterogeneous deep Web sources. We propose an automatic classification of structured deep Web sources based on the features available on the search interfaces. Our experimental data shows that the method presented by this paper has good practicability and provides fine prerequisite for further research of deep Web.","PeriodicalId":333889,"journal":{"name":"2008 International Conference on Internet Computing in Science and Engineering","volume":"66 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 International Conference on Internet Computing in Science and Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICICSE.2008.77","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Web search engines work well for finding crawlable pages, but not for finding datasets hidden behind Web search forms. On this deep Web, many sources are structured by providing structured query interfaces and results. Organizing such structured sources into a domain hierarchy that users can browse to find these valuable resources and is one of the critical steps toward the large-scale integration of heterogeneous deep Web sources. We propose an automatic classification of structured deep Web sources based on the features available on the search interfaces. Our experimental data shows that the method presented by this paper has good practicability and provides fine prerequisite for further research of deep Web.