Application of ensemble models in web ranking

2010 5th International Symposium on Telecommunications Pub Date : 2010-12-01 DOI:10.1109/ISTEL.2010.5734118

Homa Baradaran Hashemi, N. Yazdani, A. Shakery, Mahdi Pakdaman Naeini

{"title":"Application of ensemble models in web ranking","authors":"Homa Baradaran Hashemi, N. Yazdani, A. Shakery, Mahdi Pakdaman Naeini","doi":"10.1109/ISTEL.2010.5734118","DOIUrl":null,"url":null,"abstract":"One of the most important parts of search engines is the ranking unit. Many different classical ranking algorithms based on content (such as TF-IDF and BM25) and connectivity (such as HITS and PageRank) have been used in web search engines to find pages in response to a user query. Although these algorithms have been developed to improve retrieval results, none of them can take advantage of power of contents as well as useful link structures. Thus, it remains a challenging research question how to effectively combine these available information to maximize search accuracy. In this study, we investigate the application of different ensemble models in ranking algorithms. Some of them are simple such as Sum, Product and Borda rule, and the others are more complicated methods. We present three complex ensemble approaches. The first one is OWA operator to merge the results of various ranking algorithms. In the second approach, a state-of-the-art method, simulated click-through data, is used to learn how to combine many content and connectivity features of web pages. Moreover, we present a modified version of SVM classifier customized for ranking problems as the third complex fusion approach. The proposed methods are evaluated using the LETOR and dotIR benchmark data sets. The experimental results show that in most of the cases ensemble methods give better results and the improvements are very encouraging. These results also show that the OWA and SVM fusion methods are promising respect to other ensemble models.","PeriodicalId":306663,"journal":{"name":"2010 5th International Symposium on Telecommunications","volume":"73 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 5th International Symposium on Telecommunications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISTEL.2010.5734118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

One of the most important parts of search engines is the ranking unit. Many different classical ranking algorithms based on content (such as TF-IDF and BM25) and connectivity (such as HITS and PageRank) have been used in web search engines to find pages in response to a user query. Although these algorithms have been developed to improve retrieval results, none of them can take advantage of power of contents as well as useful link structures. Thus, it remains a challenging research question how to effectively combine these available information to maximize search accuracy. In this study, we investigate the application of different ensemble models in ranking algorithms. Some of them are simple such as Sum, Product and Borda rule, and the others are more complicated methods. We present three complex ensemble approaches. The first one is OWA operator to merge the results of various ranking algorithms. In the second approach, a state-of-the-art method, simulated click-through data, is used to learn how to combine many content and connectivity features of web pages. Moreover, we present a modified version of SVM classifier customized for ranking problems as the third complex fusion approach. The proposed methods are evaluated using the LETOR and dotIR benchmark data sets. The experimental results show that in most of the cases ensemble methods give better results and the improvements are very encouraging. These results also show that the OWA and SVM fusion methods are promising respect to other ensemble models.

查看原文本刊更多论文

集成模型在网页排名中的应用

搜索引擎最重要的部分之一是排名单元。许多基于内容(如TF-IDF和BM25)和连接性(如HITS和PageRank)的经典排名算法已在web搜索引擎中用于查找响应用户查询的页面。虽然这些算法都是为了改善检索结果而开发的，但它们都不能充分利用内容的力量和有用的链接结构。因此，如何有效地将这些信息组合起来，使搜索精度最大化，仍然是一个具有挑战性的研究问题。在本研究中，我们探讨了不同的集成模型在排序算法中的应用。有些方法很简单，如和、积、博尔达法则等，有些则比较复杂。我们提出了三种复杂的集成方法。第一种是OWA算子，用于合并各种排序算法的结果。在第二种方法中，一种最先进的方法，模拟点击数据，被用来学习如何结合网页的许多内容和连接功能。此外，我们提出了一种针对排序问题定制的改进版本的SVM分类器作为第三种复杂融合方法。使用LETOR和dotIR基准数据集对所提出的方法进行了评估。实验结果表明，在大多数情况下，集成方法都能得到较好的结果，其改进是非常令人鼓舞的。这些结果也表明OWA和SVM融合方法相对于其他集成模型有很大的发展前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 5th International Symposium on Telecommunications

自引率

0.00%

发文量