Data Source Selection for Large-Scale Deep Web Data Integration

2009 Second Pacific-Asia Conference on Web Mining and Web-based Application Pub Date : 2009-06-06 DOI:10.1109/WMWA.2009.25

Xuefeng Xian, Pengpeng Zhao, Wei Fang, Jie Xin, Zhiming Cui

{"title":"Data Source Selection for Large-Scale Deep Web Data Integration","authors":"Xuefeng Xian, Pengpeng Zhao, Wei Fang, Jie Xin, Zhiming Cui","doi":"10.1109/WMWA.2009.25","DOIUrl":null,"url":null,"abstract":"Deep web has been an important resource on the web due to its rich and high quality information, leading to emerging a new application area in data mining and integrates. There may be hundreds or thousands of data sources providing data of relevance to a particular domain on the web, So a primary challenge to large-scale deep web data integration is to determine in what order to user integrate candidate data sources. In this paper, we develop a most-benefit approach (MBA) for ordering candidate data sources for user integration. At the core of this approach is a utility function that quantifies the utility of a given the state of integration system; thus, we devise a utility function for integration system based on query result number. We show in practice how to efficiently apply MBA in concert with this utility function to order data sources. A detailed experimental evaluation on real datasets shows that the ordering of data sources produced by this MBA-based yields a integration system with a significantly higher utility than a wide range of other ordering strategies.","PeriodicalId":375180,"journal":{"name":"2009 Second Pacific-Asia Conference on Web Mining and Web-based Application","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Second Pacific-Asia Conference on Web Mining and Web-based Application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WMWA.2009.25","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Deep web has been an important resource on the web due to its rich and high quality information, leading to emerging a new application area in data mining and integrates. There may be hundreds or thousands of data sources providing data of relevance to a particular domain on the web, So a primary challenge to large-scale deep web data integration is to determine in what order to user integrate candidate data sources. In this paper, we develop a most-benefit approach (MBA) for ordering candidate data sources for user integration. At the core of this approach is a utility function that quantifies the utility of a given the state of integration system; thus, we devise a utility function for integration system based on query result number. We show in practice how to efficiently apply MBA in concert with this utility function to order data sources. A detailed experimental evaluation on real datasets shows that the ordering of data sources produced by this MBA-based yields a integration system with a significantly higher utility than a wide range of other ordering strategies.

查看原文本刊更多论文

大规模深度网络数据集成的数据源选择

深网以其丰富、高质量的信息资源成为网络上的重要资源，在数据挖掘和集成方面形成了一个新的应用领域。在网络上可能有成百上千的数据源提供与特定领域相关的数据，因此大规模深度网络数据集成的主要挑战是确定用户以何种顺序集成候选数据源。在本文中，我们开发了一种最有利的方法(MBA)，用于为用户集成排序候选数据源。这种方法的核心是一个效用函数，它量化了给定集成系统状态下的效用;因此，我们设计了一个基于查询结果数的集成系统效用函数。我们在实践中展示了如何有效地将MBA与此实用函数一起应用于订购数据源。对真实数据集的详细实验评估表明，这种基于mba的数据源排序产生的集成系统具有明显高于其他各种排序策略的效用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 Second Pacific-Asia Conference on Web Mining and Web-based Application

自引率

0.00%

发文量