Christophe Andrieu, Nicolas Chopin, Ettore Fincato, Mathieu Gerber
{"title":"通过整合实现无梯度优化","authors":"Christophe Andrieu, Nicolas Chopin, Ettore Fincato, Mathieu Gerber","doi":"arxiv-2408.00888","DOIUrl":null,"url":null,"abstract":"In this paper we propose a novel, general purpose, algorithm to optimize\nfunctions $l\\colon \\mathbb{R}^d \\rightarrow \\mathbb{R}$ not assumed to be\nconvex or differentiable or even continuous. The main idea is to sequentially\nfit a sequence of parametric probability densities, possessing a concentration\nproperty, to $l$ using a Bayesian update followed by a reprojection back onto\nthe chosen parametric sequence. Remarkably, with the sequence chosen to be from\nthe exponential family, reprojection essentially boils down to the computation\nof expectations. Our algorithm therefore lends itself to Monte Carlo\napproximation, ranging from plain to Sequential Monte Carlo (SMC) methods. The algorithm is therefore particularly simple to implement and we illustrate\nperformance on a challenging Machine Learning classification problem. Our\nmethodology naturally extends to the scenario where only noisy measurements of\n$l$ are available and retains ease of implementation and performance. At a\ntheoretical level we establish, in a fairly general scenario, that our\nframework can be viewed as implicitly implementing a time inhomogeneous\ngradient descent algorithm on a sequence of smoothed approximations of $l$.\nThis opens the door to establishing convergence of the algorithm and provide\ntheoretical guarantees. Along the way, we establish new results for\ninhomogeneous gradient descent algorithms of independent interest.","PeriodicalId":501215,"journal":{"name":"arXiv - STAT - Computation","volume":"47 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Gradient-free optimization via integration\",\"authors\":\"Christophe Andrieu, Nicolas Chopin, Ettore Fincato, Mathieu Gerber\",\"doi\":\"arxiv-2408.00888\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we propose a novel, general purpose, algorithm to optimize\\nfunctions $l\\\\colon \\\\mathbb{R}^d \\\\rightarrow \\\\mathbb{R}$ not assumed to be\\nconvex or differentiable or even continuous. The main idea is to sequentially\\nfit a sequence of parametric probability densities, possessing a concentration\\nproperty, to $l$ using a Bayesian update followed by a reprojection back onto\\nthe chosen parametric sequence. Remarkably, with the sequence chosen to be from\\nthe exponential family, reprojection essentially boils down to the computation\\nof expectations. Our algorithm therefore lends itself to Monte Carlo\\napproximation, ranging from plain to Sequential Monte Carlo (SMC) methods. The algorithm is therefore particularly simple to implement and we illustrate\\nperformance on a challenging Machine Learning classification problem. Our\\nmethodology naturally extends to the scenario where only noisy measurements of\\n$l$ are available and retains ease of implementation and performance. At a\\ntheoretical level we establish, in a fairly general scenario, that our\\nframework can be viewed as implicitly implementing a time inhomogeneous\\ngradient descent algorithm on a sequence of smoothed approximations of $l$.\\nThis opens the door to establishing convergence of the algorithm and provide\\ntheoretical guarantees. Along the way, we establish new results for\\ninhomogeneous gradient descent algorithms of independent interest.\",\"PeriodicalId\":501215,\"journal\":{\"name\":\"arXiv - STAT - Computation\",\"volume\":\"47 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Computation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.00888\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Computation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.00888","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
In this paper we propose a novel, general purpose, algorithm to optimize
functions $l\colon \mathbb{R}^d \rightarrow \mathbb{R}$ not assumed to be
convex or differentiable or even continuous. The main idea is to sequentially
fit a sequence of parametric probability densities, possessing a concentration
property, to $l$ using a Bayesian update followed by a reprojection back onto
the chosen parametric sequence. Remarkably, with the sequence chosen to be from
the exponential family, reprojection essentially boils down to the computation
of expectations. Our algorithm therefore lends itself to Monte Carlo
approximation, ranging from plain to Sequential Monte Carlo (SMC) methods. The algorithm is therefore particularly simple to implement and we illustrate
performance on a challenging Machine Learning classification problem. Our
methodology naturally extends to the scenario where only noisy measurements of
$l$ are available and retains ease of implementation and performance. At a
theoretical level we establish, in a fairly general scenario, that our
framework can be viewed as implicitly implementing a time inhomogeneous
gradient descent algorithm on a sequence of smoothed approximations of $l$.
This opens the door to establishing convergence of the algorithm and provide
theoretical guarantees. Along the way, we establish new results for
inhomogeneous gradient descent algorithms of independent interest.