Improving Convergence in IRGAN with PPO

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD Pub Date : 2020-01-05 DOI:10.1145/3371158.3371209

Moksh Jain, S. Kamath

引用次数: 2

Abstract

Information retrieval modeling aims to optimise generative and discriminative retrieval strategies, where, generative retrieval focuses on predicting query-specific relevant documents and discriminative retrieval tries to predict relevancy given a query-document pair. IRGAN unifies the generative and discriminative retrieval approaches through a minimax game. However, training IRGAN is unstable and varies largely with the random initialization of parameters. In this work, we propose improvements to IRGAN training through a novel optimization objective based on proximal policy optimisation and gumbel-softmax based sampling for the generator, along with a modified training algorithm which performs the gradient update on both the models simultaneously for each training iteration. We benchmark our proposed approach against IRGAN on three different information retrieval tasks and present empirical evidence of improved convergence.

查看原文本刊更多论文

提高IRGAN与PPO的收敛性

信息检索建模旨在优化生成式和判别式检索策略，其中，生成式检索侧重于预测特定查询的相关文档，而判别式检索则试图预测给定查询文档对的相关性。IRGAN通过极大极小对策统一了生成和判别检索方法。然而，IRGAN的训练是不稳定的，并且随着参数的随机初始化而变化很大。在这项工作中，我们提出了改进IRGAN训练的方法，通过基于近端策略优化和基于gumbel-softmax采样的生成器的新优化目标，以及改进的训练算法，该算法在每次训练迭代中同时对两个模型进行梯度更新。我们在三个不同的信息检索任务上对我们提出的方法进行了IRGAN的基准测试，并提供了改进收敛性的经验证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 7th ACM IKDD CoDS and 25th COMAD

自引率

0.00%

发文量