{"title":"Similarity Webpage Denoising Data Clustering Algorithm Based on Time Series","authors":"Hang Chun-mei, Wu Yang-yang","doi":"10.1109/ICMTMA.2015.240","DOIUrl":null,"url":null,"abstract":"In the processing of large data of unsteady Web page data or non first sequence Web page data, we often choose the empirical mode decomposition (EMD), typically exhibiting very high noise ratio. Using EMD to the sequence data for processing, and finally get the intrinsic mode function (IMF) and residual series, among them, there existing the local characteristic data of different time range in the intrinsic mode function, showing the property of removing impurities. The use of the characteristic of different IMF covers, obtained the initial Web page information by using the decomposition of the EMD to extract the relevant information from the Web page, for the different features of the IMF selecting different Web page information weight, then using the Euclidean distance to analysis in the similar level. The finally situation shows that using the intrinsic mode function compared with the previous way of matching directly, the former emphasizing on time series decomposition, to eliminate the influence of the noise, and then being matched by using a weighted processing idea, which makes the matching accuracy have a great promotion, this method is effective.","PeriodicalId":196962,"journal":{"name":"2015 Seventh International Conference on Measuring Technology and Mechatronics Automation","volume":"127 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Seventh International Conference on Measuring Technology and Mechatronics Automation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMTMA.2015.240","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In the processing of large data of unsteady Web page data or non first sequence Web page data, we often choose the empirical mode decomposition (EMD), typically exhibiting very high noise ratio. Using EMD to the sequence data for processing, and finally get the intrinsic mode function (IMF) and residual series, among them, there existing the local characteristic data of different time range in the intrinsic mode function, showing the property of removing impurities. The use of the characteristic of different IMF covers, obtained the initial Web page information by using the decomposition of the EMD to extract the relevant information from the Web page, for the different features of the IMF selecting different Web page information weight, then using the Euclidean distance to analysis in the similar level. The finally situation shows that using the intrinsic mode function compared with the previous way of matching directly, the former emphasizing on time series decomposition, to eliminate the influence of the noise, and then being matched by using a weighted processing idea, which makes the matching accuracy have a great promotion, this method is effective.