Homa Baradaran Hashemi, N. Yazdani, A. Shakery, Mahdi Pakdaman Naeini
{"title":"Application of ensemble models in web ranking","authors":"Homa Baradaran Hashemi, N. Yazdani, A. Shakery, Mahdi Pakdaman Naeini","doi":"10.1109/ISTEL.2010.5734118","DOIUrl":null,"url":null,"abstract":"One of the most important parts of search engines is the ranking unit. Many different classical ranking algorithms based on content (such as TF-IDF and BM25) and connectivity (such as HITS and PageRank) have been used in web search engines to find pages in response to a user query. Although these algorithms have been developed to improve retrieval results, none of them can take advantage of power of contents as well as useful link structures. Thus, it remains a challenging research question how to effectively combine these available information to maximize search accuracy. In this study, we investigate the application of different ensemble models in ranking algorithms. Some of them are simple such as Sum, Product and Borda rule, and the others are more complicated methods. We present three complex ensemble approaches. The first one is OWA operator to merge the results of various ranking algorithms. In the second approach, a state-of-the-art method, simulated click-through data, is used to learn how to combine many content and connectivity features of web pages. Moreover, we present a modified version of SVM classifier customized for ranking problems as the third complex fusion approach. The proposed methods are evaluated using the LETOR and dotIR benchmark data sets. The experimental results show that in most of the cases ensemble methods give better results and the improvements are very encouraging. These results also show that the OWA and SVM fusion methods are promising respect to other ensemble models.","PeriodicalId":306663,"journal":{"name":"2010 5th International Symposium on Telecommunications","volume":"73 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 5th International Symposium on Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISTEL.2010.5734118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 21
Abstract
One of the most important parts of search engines is the ranking unit. Many different classical ranking algorithms based on content (such as TF-IDF and BM25) and connectivity (such as HITS and PageRank) have been used in web search engines to find pages in response to a user query. Although these algorithms have been developed to improve retrieval results, none of them can take advantage of power of contents as well as useful link structures. Thus, it remains a challenging research question how to effectively combine these available information to maximize search accuracy. In this study, we investigate the application of different ensemble models in ranking algorithms. Some of them are simple such as Sum, Product and Borda rule, and the others are more complicated methods. We present three complex ensemble approaches. The first one is OWA operator to merge the results of various ranking algorithms. In the second approach, a state-of-the-art method, simulated click-through data, is used to learn how to combine many content and connectivity features of web pages. Moreover, we present a modified version of SVM classifier customized for ranking problems as the third complex fusion approach. The proposed methods are evaluated using the LETOR and dotIR benchmark data sets. The experimental results show that in most of the cases ensemble methods give better results and the improvements are very encouraging. These results also show that the OWA and SVM fusion methods are promising respect to other ensemble models.