基于共识的维基百科主题排名

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics Pub Date : 2017-08-23 DOI:10.1145/3106426.3106529

Waleed Nema, Yinshan Tang

{"title":"基于共识的维基百科主题排名","authors":"Waleed Nema, Yinshan Tang","doi":"10.1145/3106426.3106529","DOIUrl":null,"url":null,"abstract":"To improve the effectiveness of users' information seeking experience in interactive web search we hypothesize how people might be influenced when making relevance judgment decisions by introducing the Consensus Theory & Relevance Judgment Model (CT&M). This is combined with a practical path to assess the extent of difference between suggestions of current search engines versus user expectations. A user-centered, evidence-based, phenomenology approach is used to improve on Google PageRank (GPR) in two ways. The first by biasing GPR's equal navigation probability assumption using (f)actual usage stats as implicit user consensus which leads to the StatsRank (SR) algorithm. Secondly, we aggregate users' explicit ranking to derive Consensus Rank (CR) which is shown to predict individual user ranking significantly better than GPR and meta-search of modern search engines Google and Yahoo/Bing real-time. CT&M contextualizes CR, SR, and a live open online web experiment, called The Ranking Game, which is based on the August-2016 English Wikipedia corpus (12.7 million pages) and Page View Statistics for May to July 2016. Limiting this work to Wikipedia makes GPR topic-based since any Wikipedia page is focused on one topic. TREC's pooling is used to merge top 20 results from major search engines and present an alphabetized list for users' explicit ranking via drag and drop. The same platform captures implicit data for future research and can be used for controlled experiments. Our contributions are: CT&M, SR, CR, and the open online user feedback web experiment research platform.","PeriodicalId":20685,"journal":{"name":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","volume":"14 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2017-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Consensus-based ranking of wikipedia topics\",\"authors\":\"Waleed Nema, Yinshan Tang\",\"doi\":\"10.1145/3106426.3106529\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To improve the effectiveness of users' information seeking experience in interactive web search we hypothesize how people might be influenced when making relevance judgment decisions by introducing the Consensus Theory & Relevance Judgment Model (CT&M). This is combined with a practical path to assess the extent of difference between suggestions of current search engines versus user expectations. A user-centered, evidence-based, phenomenology approach is used to improve on Google PageRank (GPR) in two ways. The first by biasing GPR's equal navigation probability assumption using (f)actual usage stats as implicit user consensus which leads to the StatsRank (SR) algorithm. Secondly, we aggregate users' explicit ranking to derive Consensus Rank (CR) which is shown to predict individual user ranking significantly better than GPR and meta-search of modern search engines Google and Yahoo/Bing real-time. CT&M contextualizes CR, SR, and a live open online web experiment, called The Ranking Game, which is based on the August-2016 English Wikipedia corpus (12.7 million pages) and Page View Statistics for May to July 2016. Limiting this work to Wikipedia makes GPR topic-based since any Wikipedia page is focused on one topic. TREC's pooling is used to merge top 20 results from major search engines and present an alphabetized list for users' explicit ranking via drag and drop. The same platform captures implicit data for future research and can be used for controlled experiments. Our contributions are: CT&M, SR, CR, and the open online user feedback web experiment research platform.\",\"PeriodicalId\":20685,\"journal\":{\"name\":\"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics\",\"volume\":\"14 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3106426.3106529\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3106426.3106529","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

为了提高交互式网络搜索中用户信息寻求体验的有效性，我们通过引入共识理论和关联判断模型(CT&M)来假设人们在做出关联判断决策时可能受到的影响。这与实际路径相结合，以评估当前搜索引擎的建议与用户期望之间的差异程度。以用户为中心，以证据为基础的现象学方法用于从两个方面提高Google PageRank (GPR)。第一个是通过使用(f)实际使用统计作为隐含用户共识来偏倚GPR的相等导航概率假设，从而导致StatsRank (SR)算法。其次，我们汇总用户的显式排名，得出共识排名(Consensus Rank, CR)，该排名预测个人用户排名的效果明显优于GPR和现代搜索引擎谷歌和雅虎/必应的实时元搜索。CT&M将CR、SR和一个名为“排名游戏”(The Ranking Game)的实时开放网络实验结合起来，该实验基于2016年8月至2016年8月的英文维基百科语料库(1270万页)和2016年5月至7月的页面浏览量统计数据。将这项工作限制在维基百科使GPR基于主题，因为任何维基百科页面都专注于一个主题。TREC的池用于合并来自主要搜索引擎的前20个结果，并通过拖放显示按字母顺序排列的列表，以便用户明确排名。同样的平台为未来的研究捕获隐含数据，并可用于控制实验。我们的贡献是:CT&M, SR, CR和开放的在线用户反馈网络实验研究平台。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Consensus-based ranking of wikipedia topics

To improve the effectiveness of users' information seeking experience in interactive web search we hypothesize how people might be influenced when making relevance judgment decisions by introducing the Consensus Theory & Relevance Judgment Model (CT&M). This is combined with a practical path to assess the extent of difference between suggestions of current search engines versus user expectations. A user-centered, evidence-based, phenomenology approach is used to improve on Google PageRank (GPR) in two ways. The first by biasing GPR's equal navigation probability assumption using (f)actual usage stats as implicit user consensus which leads to the StatsRank (SR) algorithm. Secondly, we aggregate users' explicit ranking to derive Consensus Rank (CR) which is shown to predict individual user ranking significantly better than GPR and meta-search of modern search engines Google and Yahoo/Bing real-time. CT&M contextualizes CR, SR, and a live open online web experiment, called The Ranking Game, which is based on the August-2016 English Wikipedia corpus (12.7 million pages) and Page View Statistics for May to July 2016. Limiting this work to Wikipedia makes GPR topic-based since any Wikipedia page is focused on one topic. TREC's pooling is used to merge top 20 results from major search engines and present an alphabetized list for users' explicit ranking via drag and drop. The same platform captures implicit data for future research and can be used for controlled experiments. Our contributions are: CT&M, SR, CR, and the open online user feedback web experiment research platform.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics

自引率

0.00%

发文量