{"title":"A new enhanced technique for link farm detection","authors":"D. Saraswathi, A. V. Kathiravan, R. Kavitha","doi":"10.1109/ICPRIME.2012.6208290","DOIUrl":null,"url":null,"abstract":"Search engine spam is a webpage that has been designed to artificially inflating its search engine ranking. Recently this search engine spam has been increased dramatically and creates problem to the search engine and the web surfer. It degrades the search engine's results, occupies more memory and consumes more time for creating indexes, and frustrates the user by giving irrelevant results. Search engines have tried many techniques to filter out these spam pages before they can appear on the query results page. Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We have designed and develop a system, spamcity score that detects spam hosts or pages on the Web. The UK Web Spam UK 2007 data set has been used for experimentation. It is a public web spam dataset annotated at the level of hosts, for all results reported here. System uses the key features of popular link based algorithms to detect spam in improved manner. In this paper, various ways of creating spam pages, a collection of current methods that are being used to detect spam and a new approach to build a tool for improving link spam detection using spamcity score of term spam. This new approach uses SVMLight tool to detect the link spam which considers the link structure of Web and page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. The link farm can be identifying based on Web Graph, classification by using SVMLight Tool, Degree based measure, page Rank, Trust Rank, and Truncated PageRank. The spam classifier makes use of the Wordnet word database and SVMLight tool to classify web links as either spam or not spam. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.","PeriodicalId":148511,"journal":{"name":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPRIME.2012.6208290","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Search engine spam is a webpage that has been designed to artificially inflating its search engine ranking. Recently this search engine spam has been increased dramatically and creates problem to the search engine and the web surfer. It degrades the search engine's results, occupies more memory and consumes more time for creating indexes, and frustrates the user by giving irrelevant results. Search engines have tried many techniques to filter out these spam pages before they can appear on the query results page. Spammers intend to increase the PageRank of certain spam pages by creating a large number of links pointing to them. We have designed and develop a system, spamcity score that detects spam hosts or pages on the Web. The UK Web Spam UK 2007 data set has been used for experimentation. It is a public web spam dataset annotated at the level of hosts, for all results reported here. System uses the key features of popular link based algorithms to detect spam in improved manner. In this paper, various ways of creating spam pages, a collection of current methods that are being used to detect spam and a new approach to build a tool for improving link spam detection using spamcity score of term spam. This new approach uses SVMLight tool to detect the link spam which considers the link structure of Web and page contents. These statistical features are used to build a classifier that is tested over a large collection of Web link spam. The link farm can be identifying based on Web Graph, classification by using SVMLight Tool, Degree based measure, page Rank, Trust Rank, and Truncated PageRank. The spam classifier makes use of the Wordnet word database and SVMLight tool to classify web links as either spam or not spam. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.