{"title":"用混合检测方法发现隐藏的网页","authors":"Jun Deng, Hao Chen, Jianhua Sun","doi":"10.1109/ISCBI.2013.65","DOIUrl":null,"url":null,"abstract":"Web search cloaking, used by spammers for the purpose of increasing the visiting rates of their website, is a challenging spamming technique to search engines. Existing cloaking detection systems have some shortcomings: the accuracy of their algorithms is not high enough, the types of cloaking techniques that be detected are limited. In this paper, we present a new system to attack these two problems. To improve the detection accuracy, our algorithm combines text, tag and URL based method. For the purpose of detecting more types of cloaking techniques, our system works as follows: driving a real browser to execute scripts in web pages, crawl a page for the second time by modifying the referrer field of our HTTP headers, obtaining search engine's cached page for further comparison. We apply our system to 104,800 URLs extracted from Yahoo. Results show that our system can gain a high accuracy: precision at 94.52% and recall at 98.57%. More types of cloaking techniques are successfully detected by our system.","PeriodicalId":311471,"journal":{"name":"2013 International Symposium on Computational and Business Intelligence","volume":"78 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Uncovering Cloaking Web Pages with Hybrid Detection Approaches\",\"authors\":\"Jun Deng, Hao Chen, Jianhua Sun\",\"doi\":\"10.1109/ISCBI.2013.65\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Web search cloaking, used by spammers for the purpose of increasing the visiting rates of their website, is a challenging spamming technique to search engines. Existing cloaking detection systems have some shortcomings: the accuracy of their algorithms is not high enough, the types of cloaking techniques that be detected are limited. In this paper, we present a new system to attack these two problems. To improve the detection accuracy, our algorithm combines text, tag and URL based method. For the purpose of detecting more types of cloaking techniques, our system works as follows: driving a real browser to execute scripts in web pages, crawl a page for the second time by modifying the referrer field of our HTTP headers, obtaining search engine's cached page for further comparison. We apply our system to 104,800 URLs extracted from Yahoo. Results show that our system can gain a high accuracy: precision at 94.52% and recall at 98.57%. More types of cloaking techniques are successfully detected by our system.\",\"PeriodicalId\":311471,\"journal\":{\"name\":\"2013 International Symposium on Computational and Business Intelligence\",\"volume\":\"78 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-08-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Symposium on Computational and Business Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCBI.2013.65\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Symposium on Computational and Business Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCBI.2013.65","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Uncovering Cloaking Web Pages with Hybrid Detection Approaches
Web search cloaking, used by spammers for the purpose of increasing the visiting rates of their website, is a challenging spamming technique to search engines. Existing cloaking detection systems have some shortcomings: the accuracy of their algorithms is not high enough, the types of cloaking techniques that be detected are limited. In this paper, we present a new system to attack these two problems. To improve the detection accuracy, our algorithm combines text, tag and URL based method. For the purpose of detecting more types of cloaking techniques, our system works as follows: driving a real browser to execute scripts in web pages, crawl a page for the second time by modifying the referrer field of our HTTP headers, obtaining search engine's cached page for further comparison. We apply our system to 104,800 URLs extracted from Yahoo. Results show that our system can gain a high accuracy: precision at 94.52% and recall at 98.57%. More types of cloaking techniques are successfully detected by our system.