基于时间序列的相似网页去噪数据聚类算法

Hang Chun-mei, Wu Yang-yang
{"title":"基于时间序列的相似网页去噪数据聚类算法","authors":"Hang Chun-mei, Wu Yang-yang","doi":"10.1109/ICMTMA.2015.240","DOIUrl":null,"url":null,"abstract":"In the processing of large data of unsteady Web page data or non first sequence Web page data, we often choose the empirical mode decomposition (EMD), typically exhibiting very high noise ratio. Using EMD to the sequence data for processing, and finally get the intrinsic mode function (IMF) and residual series, among them, there existing the local characteristic data of different time range in the intrinsic mode function, showing the property of removing impurities. The use of the characteristic of different IMF covers, obtained the initial Web page information by using the decomposition of the EMD to extract the relevant information from the Web page, for the different features of the IMF selecting different Web page information weight, then using the Euclidean distance to analysis in the similar level. The finally situation shows that using the intrinsic mode function compared with the previous way of matching directly, the former emphasizing on time series decomposition, to eliminate the influence of the noise, and then being matched by using a weighted processing idea, which makes the matching accuracy have a great promotion, this method is effective.","PeriodicalId":196962,"journal":{"name":"2015 Seventh International Conference on Measuring Technology and Mechatronics Automation","volume":"127 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Similarity Webpage Denoising Data Clustering Algorithm Based on Time Series\",\"authors\":\"Hang Chun-mei, Wu Yang-yang\",\"doi\":\"10.1109/ICMTMA.2015.240\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the processing of large data of unsteady Web page data or non first sequence Web page data, we often choose the empirical mode decomposition (EMD), typically exhibiting very high noise ratio. Using EMD to the sequence data for processing, and finally get the intrinsic mode function (IMF) and residual series, among them, there existing the local characteristic data of different time range in the intrinsic mode function, showing the property of removing impurities. The use of the characteristic of different IMF covers, obtained the initial Web page information by using the decomposition of the EMD to extract the relevant information from the Web page, for the different features of the IMF selecting different Web page information weight, then using the Euclidean distance to analysis in the similar level. The finally situation shows that using the intrinsic mode function compared with the previous way of matching directly, the former emphasizing on time series decomposition, to eliminate the influence of the noise, and then being matched by using a weighted processing idea, which makes the matching accuracy have a great promotion, this method is effective.\",\"PeriodicalId\":196962,\"journal\":{\"name\":\"2015 Seventh International Conference on Measuring Technology and Mechatronics Automation\",\"volume\":\"127 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-06-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 Seventh International Conference on Measuring Technology and Mechatronics Automation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMTMA.2015.240\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Seventh International Conference on Measuring Technology and Mechatronics Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMTMA.2015.240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在非定常网页数据或非第一序列网页数据的大数据处理中,我们通常选择经验模态分解(EMD),通常具有很高的噪声比。利用EMD对序列数据进行处理,最终得到内禀模态函数(IMF)和残差序列,其中内禀模态函数中存在不同时间范围的局部特征数据,表现出去杂的特性。利用不同IMF封面的特征,获得初始网页信息,利用EMD分解从网页中提取相关信息,针对不同IMF特征选取不同网页信息权重,然后利用欧几里得距离在相似水平上进行分析。最后的情况表明,与之前直接匹配的方法相比,使用内禀模态函数进行匹配,前者强调时间序列分解,消除噪声的影响,然后使用加权处理思想进行匹配,使得匹配精度有很大的提升,是有效的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Similarity Webpage Denoising Data Clustering Algorithm Based on Time Series
In the processing of large data of unsteady Web page data or non first sequence Web page data, we often choose the empirical mode decomposition (EMD), typically exhibiting very high noise ratio. Using EMD to the sequence data for processing, and finally get the intrinsic mode function (IMF) and residual series, among them, there existing the local characteristic data of different time range in the intrinsic mode function, showing the property of removing impurities. The use of the characteristic of different IMF covers, obtained the initial Web page information by using the decomposition of the EMD to extract the relevant information from the Web page, for the different features of the IMF selecting different Web page information weight, then using the Euclidean distance to analysis in the similar level. The finally situation shows that using the intrinsic mode function compared with the previous way of matching directly, the former emphasizing on time series decomposition, to eliminate the influence of the noise, and then being matched by using a weighted processing idea, which makes the matching accuracy have a great promotion, this method is effective.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信