Congye Wang, Wilson Chen, Heishiro Kanagawa, Chris. J. Oates
{"title":"Reinforcement Learning for Adaptive MCMC","authors":"Congye Wang, Wilson Chen, Heishiro Kanagawa, Chris. J. Oates","doi":"arxiv-2405.13574","DOIUrl":null,"url":null,"abstract":"An informal observation, made by several authors, is that the adaptive design\nof a Markov transition kernel has the flavour of a reinforcement learning task.\nYet, to-date it has remained unclear how to actually exploit modern\nreinforcement learning technologies for adaptive MCMC. The aim of this paper is\nto set out a general framework, called Reinforcement Learning\nMetropolis--Hastings, that is theoretically supported and empirically\nvalidated. Our principal focus is on learning fast-mixing Metropolis--Hastings\ntransition kernels, which we cast as deterministic policies and optimise via a\npolicy gradient. Control of the learning rate provably ensures conditions for\nergodicity are satisfied. The methodology is used to construct a gradient-free\nsampler that out-performs a popular gradient-free adaptive Metropolis--Hastings\nalgorithm on $\\approx 90 \\%$ of tasks in the PosteriorDB benchmark.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2405.13574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
An informal observation, made by several authors, is that the adaptive design
of a Markov transition kernel has the flavour of a reinforcement learning task.
Yet, to-date it has remained unclear how to actually exploit modern
reinforcement learning technologies for adaptive MCMC. The aim of this paper is
to set out a general framework, called Reinforcement Learning
Metropolis--Hastings, that is theoretically supported and empirically
validated. Our principal focus is on learning fast-mixing Metropolis--Hastings
transition kernels, which we cast as deterministic policies and optimise via a
policy gradient. Control of the learning rate provably ensures conditions for
ergodicity are satisfied. The methodology is used to construct a gradient-free
sampler that out-performs a popular gradient-free adaptive Metropolis--Hastings
algorithm on $\approx 90 \%$ of tasks in the PosteriorDB benchmark.