Accessing the web: from search to integration

K. Chang, Junghoo Cho
{"title":"Accessing the web: from search to integration","authors":"K. Chang, Junghoo Cho","doi":"10.1145/1142473.1142601","DOIUrl":null,"url":null,"abstract":"We have witnessed the rapid growth of the Web-- It has not only \"broadened\" but also \"deepened\": While the \"surface Web\" has expanded from the 1999 estimate of 800 million to the recent 19.2 billion pages reported by Yahoo index, an equally or even more significant amount of information is hidden on the \"deep Web,\" behind query forms, recently estimated at over 1.2 million, of online databases. Accessing the information on the Web thus requires not only search to locate pages of interests, from the surface Web, but also integration to aggregate data from alternative or complementary sources, from the deep Web. Although the opportunities are unprecedented, the challenges are also immense: On the one hand, for the surface Web, while search seems to have evolved into a standard technology, its maturity and pervasiveness have also invited the attack of spam and the demand of personalization. On the other hand, for the deep Web, while the proliferation of structured sources has promised unlimited possibilities for more precise and aggregated access, it has also presented new challenges for realizing large scale and dynamic information integration. These issues are in essence related to data management, in a large scale, and thus present novel problems and interesting opportunities for our research community. This tutorial will discuss the new access scenarios and research problems in Web information access: from search of the surface Web to integration of the deep Web.","PeriodicalId":416090,"journal":{"name":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","volume":"86 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"37","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2006 ACM SIGMOD international conference on Management of data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1142473.1142601","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 37

Abstract

We have witnessed the rapid growth of the Web-- It has not only "broadened" but also "deepened": While the "surface Web" has expanded from the 1999 estimate of 800 million to the recent 19.2 billion pages reported by Yahoo index, an equally or even more significant amount of information is hidden on the "deep Web," behind query forms, recently estimated at over 1.2 million, of online databases. Accessing the information on the Web thus requires not only search to locate pages of interests, from the surface Web, but also integration to aggregate data from alternative or complementary sources, from the deep Web. Although the opportunities are unprecedented, the challenges are also immense: On the one hand, for the surface Web, while search seems to have evolved into a standard technology, its maturity and pervasiveness have also invited the attack of spam and the demand of personalization. On the other hand, for the deep Web, while the proliferation of structured sources has promised unlimited possibilities for more precise and aggregated access, it has also presented new challenges for realizing large scale and dynamic information integration. These issues are in essence related to data management, in a large scale, and thus present novel problems and interesting opportunities for our research community. This tutorial will discuss the new access scenarios and research problems in Web information access: from search of the surface Web to integration of the deep Web.
访问网络:从搜索到整合
我们目睹了网络的快速发展——它不仅“宽”而且“深”:虽然“表层网络”从1999年估计的8亿页扩展到雅虎指数报告的最近的192亿页,但在“深层网络”中隐藏着同样甚至更重要的信息,隐藏在查询表单后面,最近估计超过120万在线数据库。因此,访问网络上的信息不仅需要从表层网络中搜索找到感兴趣的页面,还需要从深层网络中整合其他或互补来源的数据。虽然机遇是前所未有的,但挑战也是巨大的:一方面,对于表面网络来说,虽然搜索似乎已经发展成为一种标准技术,但它的成熟和普及也招致了垃圾邮件的攻击和个性化的需求。另一方面,对于深度网络来说,结构化资源的激增为更精确、更聚合的访问提供了无限可能的同时,也为实现大规模、动态的信息集成提出了新的挑战。这些问题本质上与大规模的数据管理有关,因此为我们的研究界提出了新的问题和有趣的机会。本教程将讨论Web信息访问的新场景和研究问题:从表层Web的搜索到深层Web的集成。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信