联合在线学习与进化策略排序

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining Pub Date : 2019-01-30 DOI:10.1145/3289600.3290968

E. Kharitonov

{"title":"联合在线学习与进化策略排序","authors":"E. Kharitonov","doi":"10.1145/3289600.3290968","DOIUrl":null,"url":null,"abstract":"Online Learning to Rank is a powerful paradigm that allows to train ranking models using only online feedback from its users.In this work, we consider Federated Online Learning to Rank setup (FOLtR) where on-mobile ranking models are trained in a way that respects the users' privacy. We require that the user data, such as queries, results, and their feature representations are never communicated for the purpose of the ranker's training. We believe this setup is interesting, as it combines unique requirements for the learning algorithm: (a) preserving the user privacy, (b) low communication and computation costs, (c) learning from noisy bandit feedback, and (d) learning with non-continuous ranking quality measures. We propose a learning algorithm FOLtR-ES that satisfies these requirements. A part of FOLtR-ES is a privatization procedure that allows it to provide ε-local differential privacy guarantees, i.e. protecting the clients from an adversary who has access to the communicated messages. This procedure can be applied to any absolute online metric that takes finitely many values or can be discretized to a finite domain. Our experimental study is based on a widely used click simulation approach and publicly available learning to rank datasets MQ2007 and MQ2008. We evaluate FOLtR-ES against offline baselines that are trained using relevance labels, linear regression model and RankingSVM. From our experiments, we observe that FOLtR-ES can optimize a ranking model to perform similarly to the baselines in terms of the optimized online metric, Max Reciprocal Rank.","PeriodicalId":143253,"journal":{"name":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Federated Online Learning to Rank with Evolution Strategies\",\"authors\":\"E. Kharitonov\",\"doi\":\"10.1145/3289600.3290968\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Online Learning to Rank is a powerful paradigm that allows to train ranking models using only online feedback from its users.In this work, we consider Federated Online Learning to Rank setup (FOLtR) where on-mobile ranking models are trained in a way that respects the users' privacy. We require that the user data, such as queries, results, and their feature representations are never communicated for the purpose of the ranker's training. We believe this setup is interesting, as it combines unique requirements for the learning algorithm: (a) preserving the user privacy, (b) low communication and computation costs, (c) learning from noisy bandit feedback, and (d) learning with non-continuous ranking quality measures. We propose a learning algorithm FOLtR-ES that satisfies these requirements. A part of FOLtR-ES is a privatization procedure that allows it to provide ε-local differential privacy guarantees, i.e. protecting the clients from an adversary who has access to the communicated messages. This procedure can be applied to any absolute online metric that takes finitely many values or can be discretized to a finite domain. Our experimental study is based on a widely used click simulation approach and publicly available learning to rank datasets MQ2007 and MQ2008. We evaluate FOLtR-ES against offline baselines that are trained using relevance labels, linear regression model and RankingSVM. From our experiments, we observe that FOLtR-ES can optimize a ranking model to perform similarly to the baselines in terms of the optimized online metric, Max Reciprocal Rank.\",\"PeriodicalId\":143253,\"journal\":{\"name\":\"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3289600.3290968\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3289600.3290968","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

在线学习排名是一个强大的范例，它允许只使用用户的在线反馈来训练排名模型。在这项工作中，我们考虑了联邦在线学习排名设置(FOLtR)，其中移动排名模型以尊重用户隐私的方式进行训练。我们要求用户数据，如查询、结果和它们的特征表示永远不会被传达给排名员的训练目的。我们认为这种设置很有趣，因为它结合了学习算法的独特要求:(a)保护用户隐私，(b)低通信和计算成本，(c)从嘈杂的强盗反馈中学习，以及(d)使用非连续排名质量度量进行学习。我们提出了一种满足这些要求的学习算法FOLtR-ES。FOLtR-ES的一部分是私有化程序，允许它提供ε-local差分隐私保证，即保护客户端免受可以访问通信消息的对手的侵害。这个程序可以应用于任何绝对在线度量，它取有限多个值或可以离散到一个有限域。我们的实验研究基于广泛使用的点击模拟方法和公开可用的学习来对MQ2007和MQ2008数据集进行排名。我们根据使用相关标签、线性回归模型和RankingSVM训练的离线基线评估FOLtR-ES。从我们的实验中，我们观察到FOLtR-ES可以优化排名模型，使其在优化的在线指标Max Reciprocal Rank方面的表现与基线相似。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Federated Online Learning to Rank with Evolution Strategies

Online Learning to Rank is a powerful paradigm that allows to train ranking models using only online feedback from its users.In this work, we consider Federated Online Learning to Rank setup (FOLtR) where on-mobile ranking models are trained in a way that respects the users' privacy. We require that the user data, such as queries, results, and their feature representations are never communicated for the purpose of the ranker's training. We believe this setup is interesting, as it combines unique requirements for the learning algorithm: (a) preserving the user privacy, (b) low communication and computation costs, (c) learning from noisy bandit feedback, and (d) learning with non-continuous ranking quality measures. We propose a learning algorithm FOLtR-ES that satisfies these requirements. A part of FOLtR-ES is a privatization procedure that allows it to provide ε-local differential privacy guarantees, i.e. protecting the clients from an adversary who has access to the communicated messages. This procedure can be applied to any absolute online metric that takes finitely many values or can be discretized to a finite domain. Our experimental study is based on a widely used click simulation approach and publicly available learning to rank datasets MQ2007 and MQ2008. We evaluate FOLtR-ES against offline baselines that are trained using relevance labels, linear regression model and RankingSVM. From our experiments, we observe that FOLtR-ES can optimize a ranking model to perform similarly to the baselines in terms of the optimized online metric, Max Reciprocal Rank.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining

自引率

0.00%

发文量