{"title":"消除重复多媒体网页的研究","authors":"Xiaojuan Yang","doi":"10.1109/ICSAI.2012.6223520","DOIUrl":null,"url":null,"abstract":"There are many duplicated web pages in the multimedia web resources, and elimination of the duplicates can remove the duplicated pages, reduce storage costs and improve search engine performance. Based on analysis of the classic algorithm of eliminating the duplicates, his article raises an improved algorithm for judging web page text repetition. The new algorithm runs the elimination process on the basis of webpage contents which are used as the vector characteristics in the comparison with the webpages' approximation, and analyzes how to capture the web page's theme. Hence, we can make a multidimensional improvement in the elimination of the duplicates of multimedia webpages.","PeriodicalId":164945,"journal":{"name":"2012 International Conference on Systems and Informatics (ICSAI2012)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Study on the elimination of duplicated multimedia webpages\",\"authors\":\"Xiaojuan Yang\",\"doi\":\"10.1109/ICSAI.2012.6223520\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are many duplicated web pages in the multimedia web resources, and elimination of the duplicates can remove the duplicated pages, reduce storage costs and improve search engine performance. Based on analysis of the classic algorithm of eliminating the duplicates, his article raises an improved algorithm for judging web page text repetition. The new algorithm runs the elimination process on the basis of webpage contents which are used as the vector characteristics in the comparison with the webpages' approximation, and analyzes how to capture the web page's theme. Hence, we can make a multidimensional improvement in the elimination of the duplicates of multimedia webpages.\",\"PeriodicalId\":164945,\"journal\":{\"name\":\"2012 International Conference on Systems and Informatics (ICSAI2012)\",\"volume\":\"11 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2012 International Conference on Systems and Informatics (ICSAI2012)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAI.2012.6223520\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 International Conference on Systems and Informatics (ICSAI2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAI.2012.6223520","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Study on the elimination of duplicated multimedia webpages
There are many duplicated web pages in the multimedia web resources, and elimination of the duplicates can remove the duplicated pages, reduce storage costs and improve search engine performance. Based on analysis of the classic algorithm of eliminating the duplicates, his article raises an improved algorithm for judging web page text repetition. The new algorithm runs the elimination process on the basis of webpage contents which are used as the vector characteristics in the comparison with the webpages' approximation, and analyzes how to capture the web page's theme. Hence, we can make a multidimensional improvement in the elimination of the duplicates of multimedia webpages.