深层网络数据源的差异分析

2010 IEEE International Conference on Data Mining Workshops Pub Date : 2010-12-13 DOI:10.1109/ICDMW.2010.22

Tantan Liu, Fan Wang, Jiedan Zhu, G. Agrawal

{"title":"深层网络数据源的差异分析","authors":"Tantan Liu, Fan Wang, Jiedan Zhu, G. Agrawal","doi":"10.1109/ICDMW.2010.22","DOIUrl":null,"url":null,"abstract":"The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources. We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.","PeriodicalId":170201,"journal":{"name":"2010 IEEE International Conference on Data Mining Workshops","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Differential Analysis on Deep Web Data Sources\",\"authors\":\"Tantan Liu, Fan Wang, Jiedan Zhu, G. Agrawal\",\"doi\":\"10.1109/ICDMW.2010.22\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources. We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.\",\"PeriodicalId\":170201,\"journal\":{\"name\":\"2010 IEEE International Conference on Data Mining Workshops\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Data Mining Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDMW.2010.22\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDMW.2010.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

互联网在日常生活中的日益普及为数据挖掘技术的应用带来了新的挑战和机遇。互联网上一个相对较新的趋势是深网。由于大量的深网数据源往往提供相似的数据，一个重要的问题是如何进行离线分析，以了解不同来源的数据之间的差异。本文介绍了数据挖掘方法，以提取不同深度网络数据源提供的数据差异的高级总结。我们考虑了同一实体的值模式，并提出了一个新的数据挖掘问题，我们称之为差分规则挖掘。我们已经开发了一种算法来挖掘这些规则。我们的方法包括一个修剪方法来总结识别的微分规则。为了提高效率，我们使用哈希表来加速剪枝过程。通过分析四个旅游相关网站的数据，我们展示了我们方法的有效性、效率和实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Differential Analysis on Deep Web Data Sources

The growing use of Internet in everyday life has been creating new challenges and opportunities to use data mining techniques. A relatively new trend in the Internet is the deep web. As a large number of deep web data sources tend to provide similar data, an important problem is to perform offline analysis to understand the differences in data available from different sources. This paper introduces data mining methods to extract a high-level summary of the differences in data provided by different deep web data sources. We consider pattern of values with respect to the same entity and we formulate a new data mining problem, which we refer to as differential rule mining. We have developed an algorithm for mining such rules. Our method includes a pruning method to summarize the identified differential rules. For efficiency, a hash-table is used to accelerate the pruning process. We show the effectiveness, efficiency, and utility of our methods by analyzing data across four travel-related web-sites.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量