Convergence of Langevin-simulated annealing algorithms with multiplicative noise

IF 4.6 Q2 MATERIALS SCIENCE, BIOMATERIALS
Pierre Bras, Gilles Pagès
{"title":"Convergence of Langevin-simulated annealing algorithms with multiplicative noise","authors":"Pierre Bras, Gilles Pagès","doi":"10.1090/mcom/3899","DOIUrl":null,"url":null,"abstract":"<p>We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper V colon double-struck upper R Superscript d Baseline right-arrow double-struck upper R\"> <mml:semantics> <mml:mrow> <mml:mi>V</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo stretchy=\"false\">→<!-- → --></mml:mo> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">V : \\mathbb {R}^d \\to \\mathbb {R}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> a potential function to minimize, we consider the stochastic differential equation <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d upper Y Subscript t Baseline equals minus sigma sigma Superscript down-tack Baseline nabla upper V left-parenthesis upper Y Subscript t Baseline right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:msup> <mml:mi>σ<!-- σ --></mml:mi> <mml:mi mathvariant=\"normal\">⊤<!-- ⊤ --></mml:mi> </mml:msup> <mml:mi mathvariant=\"normal\">∇<!-- ∇ --></mml:mi> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">dY_t = - \\sigma \\sigma ^\\top \\nabla V(Y_t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d t plus a left-parenthesis t right-parenthesis sigma left-parenthesis upper Y Subscript t Baseline right-parenthesis d upper W Subscript t plus a left-parenthesis t right-parenthesis squared normal upper Upsilon left-parenthesis upper Y Subscript t Baseline right-parenthesis d t\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>+</mml:mo> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mn>2</mml:mn> </mml:msup> <mml:mi mathvariant=\"normal\">Υ<!-- Υ --></mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">dt + a(t)\\sigma (Y_t)dW_t + a(t)^2\\Upsilon (Y_t)dt</mml:annotation> </mml:semantics> </mml:math> </inline-formula>, where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"left-parenthesis upper W Subscript t Baseline right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">(W_t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a Brownian motion, where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"sigma colon double-struck upper R Superscript d Baseline right-arrow script upper M Subscript d Baseline left-parenthesis double-struck upper R right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>σ<!-- σ --></mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msup> <mml:mo stretchy=\"false\">→<!-- → --></mml:mo> <mml:msub> <mml:mrow> <mml:mi mathvariant=\"script\">M</mml:mi> </mml:mrow> <mml:mi>d</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">\\sigma : \\mathbb {R}^d \\to \\mathcal {M}_d(\\mathbb {R})</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is an adaptive (multiplicative) noise, where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a colon double-struck upper R Superscript plus Baseline right-arrow double-struck upper R Superscript plus\"> <mml:semantics> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo>:</mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mo>+</mml:mo> </mml:msup> <mml:mo stretchy=\"false\">→<!-- → --></mml:mo> <mml:msup> <mml:mrow> <mml:mi mathvariant=\"double-struck\">R</mml:mi> </mml:mrow> <mml:mo>+</mml:mo> </mml:msup> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">a : \\mathbb {R}^+ \\to \\mathbb {R}^+</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a function decreasing to <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"0\"> <mml:semantics> <mml:mn>0</mml:mn> <mml:annotation encoding=\"application/x-tex\">0</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"normal upper Upsilon\"> <mml:semantics> <mml:mi mathvariant=\"normal\">Υ<!-- Υ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">\\Upsilon</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a correction term. This setting can be applied to optimization problems arising in Machine Learning; allowing <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"sigma\"> <mml:semantics> <mml:mi>σ<!-- σ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">\\sigma</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to depend on the position brings faster convergence in comparison with the classical Langevin equation <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"d upper Y Subscript t Baseline equals minus nabla upper V left-parenthesis upper Y Subscript t Baseline right-parenthesis d t plus sigma d upper W Subscript t\"> <mml:semantics> <mml:mrow> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo>=</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mi mathvariant=\"normal\">∇<!-- ∇ --></mml:mi> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mi>d</mml:mi> <mml:mi>t</mml:mi> <mml:mo>+</mml:mo> <mml:mi>σ<!-- σ --></mml:mi> <mml:mi>d</mml:mi> <mml:msub> <mml:mi>W</mml:mi> <mml:mi>t</mml:mi> </mml:msub> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">dY_t = -\\nabla V(Y_t)dt + \\sigma dW_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. The case where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"sigma\"> <mml:semantics> <mml:mi>σ<!-- σ --></mml:mi> <mml:annotation encoding=\"application/x-tex\">\\sigma</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a constant matrix has been extensively studied; however little attention has been paid to the general case. We prove the convergence for the <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper L Superscript 1\"> <mml:semantics> <mml:msup> <mml:mi>L</mml:mi> <mml:mn>1</mml:mn> </mml:msup> <mml:annotation encoding=\"application/x-tex\">L^1</mml:annotation> </mml:semantics> </mml:math> </inline-formula>-Wasserstein distance of <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper Y Subscript t\"> <mml:semantics> <mml:msub> <mml:mi>Y</mml:mi> <mml:mi>t</mml:mi> </mml:msub> <mml:annotation encoding=\"application/x-tex\">Y_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and of the associated Euler scheme <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"upper Y overbar Subscript t\"> <mml:semantics> <mml:msub> <mml:mrow> <mml:mover> <mml:mi>Y</mml:mi> <mml:mo stretchy=\"false\">¯<!-- ¯ --></mml:mo> </mml:mover> </mml:mrow> <mml:mi>t</mml:mi> </mml:msub> <mml:annotation encoding=\"application/x-tex\">\\bar {Y}_t</mml:annotation> </mml:semantics> </mml:math> </inline-formula> to some measure <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"nu Superscript star\"> <mml:semantics> <mml:msup> <mml:mi>ν<!-- ν --></mml:mi> <mml:mo>⋆<!-- ⋆ --></mml:mo> </mml:msup> <mml:annotation encoding=\"application/x-tex\">\\nu ^\\star</mml:annotation> </mml:semantics> </mml:math> </inline-formula> which is supported by <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a r g m i n left-parenthesis upper V right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>argmin</mml:mi> <mml:mo>⁡<!-- ⁡ --></mml:mo> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">\\operatorname {argmin}(V)</mml:annotation> </mml:semantics> </mml:math> </inline-formula> and give rates of convergence to the instantaneous Gibbs measure <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"nu Subscript a left-parenthesis t right-parenthesis\"> <mml:semantics> <mml:msub> <mml:mi>ν<!-- ν --></mml:mi> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> </mml:msub> <mml:annotation encoding=\"application/x-tex\">\\nu _{a(t)}</mml:annotation> </mml:semantics> </mml:math> </inline-formula> of density <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"proportional-to exp left-parenthesis minus 2 upper V left-parenthesis x right-parenthesis slash a left-parenthesis t right-parenthesis squared right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mo>∝<!-- ∝ --></mml:mo> <mml:mi>exp</mml:mi> <mml:mo>⁡<!-- ⁡ --></mml:mo> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mo>−<!-- − --></mml:mo> <mml:mn>2</mml:mn> <mml:mi>V</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>x</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mrow> <mml:mo>/</mml:mo> </mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mn>2</mml:mn> </mml:msup> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">\\propto \\exp (-2V(x)/a(t)^2)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. To do so, we first consider the case where <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a\"> <mml:semantics> <mml:mi>a</mml:mi> <mml:annotation encoding=\"application/x-tex\">a</mml:annotation> </mml:semantics> </mml:math> </inline-formula> is a piecewise constant function. We find again the classical schedule <inline-formula content-type=\"math/mathml\"> <mml:math xmlns:mml=\"http://www.w3.org/1998/Math/MathML\" alttext=\"a left-parenthesis t right-parenthesis equals upper A log Superscript negative 1 slash 2 Baseline left-parenthesis t right-parenthesis\"> <mml:semantics> <mml:mrow> <mml:mi>a</mml:mi> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> <mml:mo>=</mml:mo> <mml:mi>A</mml:mi> <mml:msup> <mml:mi>log</mml:mi> <mml:mrow> <mml:mo>−<!-- − --></mml:mo> <mml:mn>1</mml:mn> <mml:mrow> <mml:mo>/</mml:mo> </mml:mrow> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>⁡<!-- ⁡ --></mml:mo> <mml:mo stretchy=\"false\">(</mml:mo> <mml:mi>t</mml:mi> <mml:mo stretchy=\"false\">)</mml:mo> </mml:mrow> <mml:annotation encoding=\"application/x-tex\">a(t) = A\\log ^{-1/2}(t)</mml:annotation> </mml:semantics> </mml:math> </inline-formula>. We then prove the convergence for the general case by giving bounds for the Wasserstein distance to the stepwise constant case using ergodicity properties.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":null,"pages":null},"PeriodicalIF":4.6000,"publicationDate":"2024-03-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1090/mcom/3899","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
引用次数: 0

Abstract

We study the convergence of Langevin-Simulated Annealing type algorithms with multiplicative noise, i.e. for V : R d R V : \mathbb {R}^d \to \mathbb {R} a potential function to minimize, we consider the stochastic differential equation d Y t = σ σ V ( Y t ) dY_t = - \sigma \sigma ^\top \nabla V(Y_t) d t + a ( t ) σ ( Y t ) d W t + a ( t ) 2 Υ ( Y t ) d t dt + a(t)\sigma (Y_t)dW_t + a(t)^2\Upsilon (Y_t)dt , where ( W t ) (W_t) is a Brownian motion, where σ : R d M d ( R ) \sigma : \mathbb {R}^d \to \mathcal {M}_d(\mathbb {R}) is an adaptive (multiplicative) noise, where a : R + R + a : \mathbb {R}^+ \to \mathbb {R}^+ is a function decreasing to 0 0 and where Υ \Upsilon is a correction term. This setting can be applied to optimization problems arising in Machine Learning; allowing σ \sigma to depend on the position brings faster convergence in comparison with the classical Langevin equation d Y t = V ( Y t ) d t + σ d W t dY_t = -\nabla V(Y_t)dt + \sigma dW_t . The case where σ \sigma is a constant matrix has been extensively studied; however little attention has been paid to the general case. We prove the convergence for the L 1 L^1 -Wasserstein distance of Y t Y_t and of the associated Euler scheme Y ¯ t \bar {Y}_t to some measure ν \nu ^\star which is supported by argmin ( V ) \operatorname {argmin}(V) and give rates of convergence to the instantaneous Gibbs measure ν a ( t ) \nu _{a(t)} of density exp ( 2 V ( x ) / a ( t ) 2 ) \propto \exp (-2V(x)/a(t)^2) . To do so, we first consider the case where a a is a piecewise constant function. We find again the classical schedule a ( t ) = A log 1 / 2 ( t ) a(t) = A\log ^{-1/2}(t) . We then prove the convergence for the general case by giving bounds for the Wasserstein distance to the stepwise constant case using ergodicity properties.

具有乘法噪声的朗格文模拟退火算法的收敛性
我们研究了带有乘法噪声的朗格文模拟退火算法的收敛性,即对于 V : R d → R V :\mathbb {R}^d \to \mathbb {R} 的势函数最小化、我们考虑随机微分方程 d Y t = - σ σ ⊤∇ V ( Y t ) dY_t = -\V(Y_t) d t + a ( t ) σ ( Y t ) d W t + a ( t ) 2 Υ ( Y t ) d t dt + a(t)\sigma (Y_t)dW_t + a(t)^2\Upsilon (Y_t)dt 、其中 ( W t ) (W_t) 是布朗运动,其中 σ : R d → M d ( R ) σ : \mathbb {R}^d \to \mathcal {M}_d(\mathbb {R}) 是一个自适应(乘法)噪声,其中 a : R + → R + a : \mathbb {R}^+ \to \mathbb {R}^+ 是一个递减到 0 0 的函数,Υ \Upsilon 是一个修正项。这种设置可以应用于机器学习中出现的优化问题;与经典的朗格文方程 d Y t = -∇ V ( Y t ) d t + σ d W t dY_t = -\nabla V(Y_t)dt + \sigma dW_t 相比,允许 σ \sigma 取决于位置会带来更快的收敛速度。σ \sigma 是常量矩阵的情况已被广泛研究,但对一般情况的研究却很少。我们证明了 Y t 的 L 1 L^1 - Wasserstein 距离的收敛性。我们证明了 Y t Y_t 和相关欧拉方案 Y ¯ t (bar {Y}_t)的瓦瑟斯坦距离收敛于某个由 argmin ( V ) \operatorname {argmin}(V) 支持的度量 ν ⋆ \nu ^\star ,并给出了密度 ∝ exp ( - 2 V ( x ) / a ( t ) 2 ) 的瞬时吉布斯度量 ν a ( t ) \nu _{a(t)} 的收敛速率。 \propto \exp (-2V(x)/a(t)^2) .为此,我们首先考虑 a a 是片断常数函数的情况。我们再次找到经典的时间表 a ( t ) = A log - 1 / 2 ( t ) a(t) = A\log ^{-1/2}(t) 。然后,我们利用遍历特性给出了步进常数情况下的瓦瑟斯坦距离的边界,从而证明了一般情况下的收敛性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACS Applied Bio Materials
ACS Applied Bio Materials Chemistry-Chemistry (all)
CiteScore
9.40
自引率
2.10%
发文量
464
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信