Web数据挖掘方法应用实践

Sci. J. Riga Tech. Univ. Ser. Comput. Sci. Pub Date : 1900-01-01 DOI:10.2478/v10143-010-0014-x

P. Osipov, A. Borisov

{"title":"Web数据挖掘方法应用实践","authors":"P. Osipov, A. Borisov","doi":"10.2478/v10143-010-0014-x","DOIUrl":null,"url":null,"abstract":"Practice of Web Data Mining Methods Application Recent growth of information on the Internet imposes high demands on the effectiveness of processing algorithms. This paper discusses some algorithms from the field of Web Data Mining which have proved effective in many existing applications. The paper is divided into two logical parts; the first part provides a theoretical description of the algorithms, but the second one contains examples of their successful use to solve real problems. Search algorithms of vague duplicates of documents are currently actively used by all the leading search engines in the world. The paper describes the following algorithms: shingles, signature methods and image-based algorithms. Such methods of classification as a method of fuzzy clustering to-medium (Fuzzy cmeans/ FCM clustering) and clustering by ant colony (Standard Ant Clustering Algorithm SACA) are considered. In conclusion, the experience of the successful application of fuzzy clustering in conjunction with the software toolkit DataEngine to improve the efficiency of the bank \"BCI Bank\" is described as well as the sharing of the ant colony clustering method in conjunction with linear genetic programming to meet the increasing efficiency of predicting the load on the servers of high load Internet portal Monash Institut.","PeriodicalId":211660,"journal":{"name":"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.","volume":"5 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Practice of Web Data Mining Methods Application\",\"authors\":\"P. Osipov, A. Borisov\",\"doi\":\"10.2478/v10143-010-0014-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Practice of Web Data Mining Methods Application Recent growth of information on the Internet imposes high demands on the effectiveness of processing algorithms. This paper discusses some algorithms from the field of Web Data Mining which have proved effective in many existing applications. The paper is divided into two logical parts; the first part provides a theoretical description of the algorithms, but the second one contains examples of their successful use to solve real problems. Search algorithms of vague duplicates of documents are currently actively used by all the leading search engines in the world. The paper describes the following algorithms: shingles, signature methods and image-based algorithms. Such methods of classification as a method of fuzzy clustering to-medium (Fuzzy cmeans/ FCM clustering) and clustering by ant colony (Standard Ant Clustering Algorithm SACA) are considered. In conclusion, the experience of the successful application of fuzzy clustering in conjunction with the software toolkit DataEngine to improve the efficiency of the bank \\\"BCI Bank\\\" is described as well as the sharing of the ant colony clustering method in conjunction with linear genetic programming to meet the increasing efficiency of predicting the load on the servers of high load Internet portal Monash Institut.\",\"PeriodicalId\":211660,\"journal\":{\"name\":\"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.\",\"volume\":\"5 7\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/v10143-010-0014-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/v10143-010-0014-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

Web数据挖掘方法的实践应用近年来互联网上信息的增长对处理算法的有效性提出了很高的要求。本文讨论了Web数据挖掘领域的一些算法，这些算法在许多现有的应用中被证明是有效的。本文从逻辑上分为两个部分;第一部分提供了算法的理论描述，但第二部分包含了它们成功应用于解决实际问题的例子。文档模糊副本的搜索算法目前被世界上所有主要的搜索引擎所积极使用。本文介绍了以下算法:带状、签名方法和基于图像的算法。考虑了模糊聚类方法(fuzzy cmeans/ FCM聚类)和蚁群聚类(标准蚂蚁聚类算法SACA)等分类方法。最后，介绍了利用模糊聚类方法结合DataEngine软件工具包提高银行“BCI bank”效率的成功经验，以及利用蚁群聚类方法结合线性遗传规划来满足高负载互联网门户Monash institute服务器不断提高的负载预测效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Practice of Web Data Mining Methods Application

Practice of Web Data Mining Methods Application Recent growth of information on the Internet imposes high demands on the effectiveness of processing algorithms. This paper discusses some algorithms from the field of Web Data Mining which have proved effective in many existing applications. The paper is divided into two logical parts; the first part provides a theoretical description of the algorithms, but the second one contains examples of their successful use to solve real problems. Search algorithms of vague duplicates of documents are currently actively used by all the leading search engines in the world. The paper describes the following algorithms: shingles, signature methods and image-based algorithms. Such methods of classification as a method of fuzzy clustering to-medium (Fuzzy cmeans/ FCM clustering) and clustering by ant colony (Standard Ant Clustering Algorithm SACA) are considered. In conclusion, the experience of the successful application of fuzzy clustering in conjunction with the software toolkit DataEngine to improve the efficiency of the bank "BCI Bank" is described as well as the sharing of the ant colony clustering method in conjunction with linear genetic programming to meet the increasing efficiency of predicting the load on the servers of high load Internet portal Monash Institut.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Sci. J. Riga Tech. Univ. Ser. Comput. Sci.

自引率

0.00%

发文量