Web数据挖掘方法应用实践

P. Osipov, A. Borisov
{"title":"Web数据挖掘方法应用实践","authors":"P. Osipov, A. Borisov","doi":"10.2478/v10143-010-0014-x","DOIUrl":null,"url":null,"abstract":"Practice of Web Data Mining Methods Application Recent growth of information on the Internet imposes high demands on the effectiveness of processing algorithms. This paper discusses some algorithms from the field of Web Data Mining which have proved effective in many existing applications. The paper is divided into two logical parts; the first part provides a theoretical description of the algorithms, but the second one contains examples of their successful use to solve real problems. Search algorithms of vague duplicates of documents are currently actively used by all the leading search engines in the world. The paper describes the following algorithms: shingles, signature methods and image-based algorithms. Such methods of classification as a method of fuzzy clustering to-medium (Fuzzy cmeans/ FCM clustering) and clustering by ant colony (Standard Ant Clustering Algorithm SACA) are considered. In conclusion, the experience of the successful application of fuzzy clustering in conjunction with the software toolkit DataEngine to improve the efficiency of the bank \"BCI Bank\" is described as well as the sharing of the ant colony clustering method in conjunction with linear genetic programming to meet the increasing efficiency of predicting the load on the servers of high load Internet portal Monash Institut.","PeriodicalId":211660,"journal":{"name":"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.","volume":"5 7","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Practice of Web Data Mining Methods Application\",\"authors\":\"P. Osipov, A. Borisov\",\"doi\":\"10.2478/v10143-010-0014-x\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Practice of Web Data Mining Methods Application Recent growth of information on the Internet imposes high demands on the effectiveness of processing algorithms. This paper discusses some algorithms from the field of Web Data Mining which have proved effective in many existing applications. The paper is divided into two logical parts; the first part provides a theoretical description of the algorithms, but the second one contains examples of their successful use to solve real problems. Search algorithms of vague duplicates of documents are currently actively used by all the leading search engines in the world. The paper describes the following algorithms: shingles, signature methods and image-based algorithms. Such methods of classification as a method of fuzzy clustering to-medium (Fuzzy cmeans/ FCM clustering) and clustering by ant colony (Standard Ant Clustering Algorithm SACA) are considered. In conclusion, the experience of the successful application of fuzzy clustering in conjunction with the software toolkit DataEngine to improve the efficiency of the bank \\\"BCI Bank\\\" is described as well as the sharing of the ant colony clustering method in conjunction with linear genetic programming to meet the increasing efficiency of predicting the load on the servers of high load Internet portal Monash Institut.\",\"PeriodicalId\":211660,\"journal\":{\"name\":\"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.\",\"volume\":\"5 7\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2478/v10143-010-0014-x\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Sci. J. Riga Tech. Univ. Ser. Comput. Sci.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2478/v10143-010-0014-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

Web数据挖掘方法的实践应用近年来互联网上信息的增长对处理算法的有效性提出了很高的要求。本文讨论了Web数据挖掘领域的一些算法,这些算法在许多现有的应用中被证明是有效的。本文从逻辑上分为两个部分;第一部分提供了算法的理论描述,但第二部分包含了它们成功应用于解决实际问题的例子。文档模糊副本的搜索算法目前被世界上所有主要的搜索引擎所积极使用。本文介绍了以下算法:带状、签名方法和基于图像的算法。考虑了模糊聚类方法(fuzzy cmeans/ FCM聚类)和蚁群聚类(标准蚂蚁聚类算法SACA)等分类方法。最后,介绍了利用模糊聚类方法结合DataEngine软件工具包提高银行“BCI bank”效率的成功经验,以及利用蚁群聚类方法结合线性遗传规划来满足高负载互联网门户Monash institute服务器不断提高的负载预测效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Practice of Web Data Mining Methods Application
Practice of Web Data Mining Methods Application Recent growth of information on the Internet imposes high demands on the effectiveness of processing algorithms. This paper discusses some algorithms from the field of Web Data Mining which have proved effective in many existing applications. The paper is divided into two logical parts; the first part provides a theoretical description of the algorithms, but the second one contains examples of their successful use to solve real problems. Search algorithms of vague duplicates of documents are currently actively used by all the leading search engines in the world. The paper describes the following algorithms: shingles, signature methods and image-based algorithms. Such methods of classification as a method of fuzzy clustering to-medium (Fuzzy cmeans/ FCM clustering) and clustering by ant colony (Standard Ant Clustering Algorithm SACA) are considered. In conclusion, the experience of the successful application of fuzzy clustering in conjunction with the software toolkit DataEngine to improve the efficiency of the bank "BCI Bank" is described as well as the sharing of the ant colony clustering method in conjunction with linear genetic programming to meet the increasing efficiency of predicting the load on the servers of high load Internet portal Monash Institut.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信