A Note on Subgeometric Rate Convergence for Ergodic Markov Chains in the Wasserstein Metric

Bulletin of Mathematical Sciences and Applications Pub Date : 2016-11-01 DOI:10.18052/WWW.SCIPRESS.COM/BMSA.17.40

Mokaedi V. Lekgari

{"title":"A Note on Subgeometric Rate Convergence for Ergodic Markov Chains in the Wasserstein Metric","authors":"Mokaedi V. Lekgari","doi":"10.18052/WWW.SCIPRESS.COM/BMSA.17.40","DOIUrl":null,"url":null,"abstract":"We investigate subgeometric rate ergodicity for Markov chains in the Wasserstein metric and show that the finiteness of the expectation E(i,j)[ ∑τ△−1 k=0 r(k)], where τ△ is the hitting time on the coupling set △ and r is a subgeometric rate function, is equivalent to a sequence of Foster-Lyapunov drift conditions which imply subgeometric convergence in the Wassertein distance. We give an example for a ’family of nested drift conditions’. Introduction and Notations We start with a brief review of ergodicity. Let Z+ = {0, 1, 2, ...}, N+ = {1, 2, ...}, and R+ = [0,∞). Let (Φn)n∈Z+ denote a Markov chain with transition kernel P on a countably generated state space denoted by (X ,B(X )). P (i, j) = Pi(Φn=j) = Ei[1Φn=j ], where Pi and Ei respectively denote the probability and expectation of the chain under the condition that its initial state Φ0 = i, and 1A is the indicator function of set A. According to Markov’s theorem, a Markov chain (Φn)n∈Z+ is ergodic if there’s positive probability to pass from any state, say i ∈ X to any other state, say · ∈ X in one step. That is, for states i, · ∈ X then chain (Φn)n∈Z+ is ergodic if P (i, ·) > 0. Also the chain (Φn)n∈Z+ is said to be (ordinary) ergodic if ∀ i, · ∈ X then P (i, ·) → π(·) as n → ∞, where the σ-finite measure π is the invariant limit distribution of the chain. Chain (Φn)n∈Z+ is referred to as geometrically ergodic if there exists some measurable function V : X → (0,∞), and constants β < 1 andM < ∞ such that ||P (i, ·)− π(·)|| ≤ MV (i)β, ∀ n ∈ N+, where here and hereafter for the (signed) measure μwe define μ(f) = ∫ μ(dj)f(j), and the norm ||μ|| is defined by sup|g|≤f |μ(g)|, whereas the total variation norm is defined similarly but with f ≡ 1. Markov chain (Φn)n∈Z+ is strongly ergodic if lim n→∞ sup i∈X ||P (i, ·)− π(·)|| = 0. Loosely speaking subgeometric ergodicity, which we define next, is a kind of convergence that’s faster than ordinary ergodicity but slower than geometric ergodicity. Let function r ∈ Λ0 where Λ0 is the family of measurable increasing functions r : R+ → [1,∞) satisfying log r(t) t ↓ 0 as t ↑ ∞. Let Λ denote the class of positive functions r : R+ → (0,∞) such that for some r ∈ Λ0 we have; 0 < lim n inf r(n) r(n) ≤ lim n sup r(n) r(n) < ∞. (1) Indeed (1) implies the equivalence of the class of functionsΛ0 with the class of functions Λ. Examples of functions in the class r ∈ Λ is the rate r(n) = exp(sn), α > 0, s > 0. Without loss to Bulletin of Mathematical Sciences and Applications Submitted: 2016-08-30 ISSN: 2278-9634, Vol. 17, pp 40-45 Revised: 2016-10-10 doi:10.18052/www.scipress.com/BMSA.17.40 Accepted: 2016-10-17 2016 SciPress Ltd, Switzerland Online: 2016-11-01 SciPress applies the CC-BY 4.0 license to works we publish: https://creativecommons.org/licenses/by/4.0/ generality we suppose that r(0) = 1 whenever r ∈ Λ. The properties of r ∈ Λ0 which follow from (1) and are to be used frequently in this study are; r(x+ y) ≤ r(x)r(y) ∀ x, y ∈ R+ (2) r(x+ a) r(x) → 1 as x → ∞, for each a ∈ R+. (3) Λ is referred to as the class of subgeometric rate functions(cf. [3]). Let r ∈ Λ, then the ergodic chain Φn is said to be subgeometrically ergodic of order r in the f norm, (or simply (f, r)-ergodic) if for the unique invariant distribution π of the process and ∀ i ∈ X , then lim n→+∞ r(n)||P (i, ·)− π(·)||f = 0, (4) where ||σ||f = sup|g|≤f |σ(g)| and f : X → [1,∞) is a measurable function. Also for subgeometric ergodic to hold it’s necessary that there exist a deterministic sequence {Vn} of functions Vn : X → [1,∞) which satisfy the Foster-Lyapunov drift condition: PVn+1 ≤ Vn − r(n)f + br(n)1C , n ∈ Z+. (5) for a petite set C ∈ B(X ) and a constant b ∈ R+ such that supC V0 < ∞. The Foster-Lyapunov drift conditions provide bounds on the return time to accessible sets thereby availing some control on the Markov process dynamics by focusing on the hitting times on a particular set. Convergence in the Wasserstein distance is a very interesting research area through which [1] amongst other authors suggested a new technique for establishing subgeometric ergodicity. Following [1] we define the Wasserstein distance as follows. Let (X , d) be a Polish space where d is a distance bounded by 1 and let P(X ) denote the set of all probability measures on state space (X ,B(X )). Let μ, ν ∈ P(X ); λ is a coupling of μ and ν if λ is a probability on the product space (X ×X ,B(X ×X )), such that λ(A×X ) = μ(A) and λ(X ×A) = ν(A) ∀ A ∈ B(X ). We further let C(μ, ν) be set of all probability measures on (X ×X ,B(X ×X )) with marginals μ and ν, and Q be the coupling Markov kernel on (X × X ,B(X × X )) such that for every i, j ∈ X , then Q((i, j), ·) is a coupling of P (i, ·) and P (j, ·). The Wasserstein metric associated with the semimetric d on X , between two probability measures μ and ν, is then given as Wd(μ, ν) := inf γ∈C(μ,ν) ∫ X×X d(i, j)dγ(i, j). When d is the trivial metric d0(i, j) = 1i ̸=j , then the associatedWasserstein metric is the total variation metricWd0(μ, ν) = dTV (μ, ν) := 2 supC∈B(X ) |μ(C)− ν(C)|, μ, ν ∈ P(X ). A set C is said to be small if there exists a constant ε > 0 such that for all i, j ∈ C then 1 2 dTV (P (i, ·), P (j, ·)) ≤ 1 − ε. Set C ∈ B(X ) is petite if there exist some non-trivial measure νa on B(X ) and some probability distribution a = {an : n ∈ Z+} such that ∞ ∑ n=1 anP (x, ·) ≥ νa(·), ∀ x ∈ C. (6) Petite sets generalize small sets. The first hitting time on small set C delayed by a constant δ > 0 is given by τ δ C = inf{n ≥ δ : Φn ∈ C}. We also have τ C = inf{n ≥ J1 : Φn ∈ C} as the first hitting time on the set C after the first jump J1 of the process. We note that ξ C = ξC if Φ0 / ∈ C. In the case when δ = 0 we have τ 0 C = τC . If C is a singleton consisting only of state i then we write τ δ i for τ δ C and equivalently τ i for τ C . It’s worth noting that finite mean return times Ei[τ + i ] < ∞ guarantee ergodicity or the existence of stationary probability and the convergence P (i, j)− π → 0 Bulletin of Mathematical Sciences and Applications Vol. 17 41","PeriodicalId":252632,"journal":{"name":"Bulletin of Mathematical Sciences and Applications","volume":"197 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Sciences and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18052/WWW.SCIPRESS.COM/BMSA.17.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

We investigate subgeometric rate ergodicity for Markov chains in the Wasserstein metric and show that the finiteness of the expectation E(i,j)[ ∑τ△−1 k=0 r(k)], where τ△ is the hitting time on the coupling set △ and r is a subgeometric rate function, is equivalent to a sequence of Foster-Lyapunov drift conditions which imply subgeometric convergence in the Wassertein distance. We give an example for a ’family of nested drift conditions’. Introduction and Notations We start with a brief review of ergodicity. Let Z+ = {0, 1, 2, ...}, N+ = {1, 2, ...}, and R+ = [0,∞). Let (Φn)n∈Z+ denote a Markov chain with transition kernel P on a countably generated state space denoted by (X ,B(X )). P (i, j) = Pi(Φn=j) = Ei[1Φn=j ], where Pi and Ei respectively denote the probability and expectation of the chain under the condition that its initial state Φ0 = i, and 1A is the indicator function of set A. According to Markov’s theorem, a Markov chain (Φn)n∈Z+ is ergodic if there’s positive probability to pass from any state, say i ∈ X to any other state, say · ∈ X in one step. That is, for states i, · ∈ X then chain (Φn)n∈Z+ is ergodic if P (i, ·) > 0. Also the chain (Φn)n∈Z+ is said to be (ordinary) ergodic if ∀ i, · ∈ X then P (i, ·) → π(·) as n → ∞, where the σ-finite measure π is the invariant limit distribution of the chain. Chain (Φn)n∈Z+ is referred to as geometrically ergodic if there exists some measurable function V : X → (0,∞), and constants β < 1 andM < ∞ such that ||P (i, ·)− π(·)|| ≤ MV (i)β, ∀ n ∈ N+, where here and hereafter for the (signed) measure μwe define μ(f) = ∫ μ(dj)f(j), and the norm ||μ|| is defined by sup|g|≤f |μ(g)|, whereas the total variation norm is defined similarly but with f ≡ 1. Markov chain (Φn)n∈Z+ is strongly ergodic if lim n→∞ sup i∈X ||P (i, ·)− π(·)|| = 0. Loosely speaking subgeometric ergodicity, which we define next, is a kind of convergence that’s faster than ordinary ergodicity but slower than geometric ergodicity. Let function r ∈ Λ0 where Λ0 is the family of measurable increasing functions r : R+ → [1,∞) satisfying log r(t) t ↓ 0 as t ↑ ∞. Let Λ denote the class of positive functions r : R+ → (0,∞) such that for some r ∈ Λ0 we have; 0 < lim n inf r(n) r(n) ≤ lim n sup r(n) r(n) < ∞. (1) Indeed (1) implies the equivalence of the class of functionsΛ0 with the class of functions Λ. Examples of functions in the class r ∈ Λ is the rate r(n) = exp(sn), α > 0, s > 0. Without loss to Bulletin of Mathematical Sciences and Applications Submitted: 2016-08-30 ISSN: 2278-9634, Vol. 17, pp 40-45 Revised: 2016-10-10 doi:10.18052/www.scipress.com/BMSA.17.40 Accepted: 2016-10-17 2016 SciPress Ltd, Switzerland Online: 2016-11-01 SciPress applies the CC-BY 4.0 license to works we publish: https://creativecommons.org/licenses/by/4.0/ generality we suppose that r(0) = 1 whenever r ∈ Λ. The properties of r ∈ Λ0 which follow from (1) and are to be used frequently in this study are; r(x+ y) ≤ r(x)r(y) ∀ x, y ∈ R+ (2) r(x+ a) r(x) → 1 as x → ∞, for each a ∈ R+. (3) Λ is referred to as the class of subgeometric rate functions(cf. [3]). Let r ∈ Λ, then the ergodic chain Φn is said to be subgeometrically ergodic of order r in the f norm, (or simply (f, r)-ergodic) if for the unique invariant distribution π of the process and ∀ i ∈ X , then lim n→+∞ r(n)||P (i, ·)− π(·)||f = 0, (4) where ||σ||f = sup|g|≤f |σ(g)| and f : X → [1,∞) is a measurable function. Also for subgeometric ergodic to hold it’s necessary that there exist a deterministic sequence {Vn} of functions Vn : X → [1,∞) which satisfy the Foster-Lyapunov drift condition: PVn+1 ≤ Vn − r(n)f + br(n)1C , n ∈ Z+. (5) for a petite set C ∈ B(X ) and a constant b ∈ R+ such that supC V0 < ∞. The Foster-Lyapunov drift conditions provide bounds on the return time to accessible sets thereby availing some control on the Markov process dynamics by focusing on the hitting times on a particular set. Convergence in the Wasserstein distance is a very interesting research area through which [1] amongst other authors suggested a new technique for establishing subgeometric ergodicity. Following [1] we define the Wasserstein distance as follows. Let (X , d) be a Polish space where d is a distance bounded by 1 and let P(X ) denote the set of all probability measures on state space (X ,B(X )). Let μ, ν ∈ P(X ); λ is a coupling of μ and ν if λ is a probability on the product space (X ×X ,B(X ×X )), such that λ(A×X ) = μ(A) and λ(X ×A) = ν(A) ∀ A ∈ B(X ). We further let C(μ, ν) be set of all probability measures on (X ×X ,B(X ×X )) with marginals μ and ν, and Q be the coupling Markov kernel on (X × X ,B(X × X )) such that for every i, j ∈ X , then Q((i, j), ·) is a coupling of P (i, ·) and P (j, ·). The Wasserstein metric associated with the semimetric d on X , between two probability measures μ and ν, is then given as Wd(μ, ν) := inf γ∈C(μ,ν) ∫ X×X d(i, j)dγ(i, j). When d is the trivial metric d0(i, j) = 1i ̸=j , then the associatedWasserstein metric is the total variation metricWd0(μ, ν) = dTV (μ, ν) := 2 supC∈B(X ) |μ(C)− ν(C)|, μ, ν ∈ P(X ). A set C is said to be small if there exists a constant ε > 0 such that for all i, j ∈ C then 1 2 dTV (P (i, ·), P (j, ·)) ≤ 1 − ε. Set C ∈ B(X ) is petite if there exist some non-trivial measure νa on B(X ) and some probability distribution a = {an : n ∈ Z+} such that ∞ ∑ n=1 anP (x, ·) ≥ νa(·), ∀ x ∈ C. (6) Petite sets generalize small sets. The first hitting time on small set C delayed by a constant δ > 0 is given by τ δ C = inf{n ≥ δ : Φn ∈ C}. We also have τ C = inf{n ≥ J1 : Φn ∈ C} as the first hitting time on the set C after the first jump J1 of the process. We note that ξ C = ξC if Φ0 / ∈ C. In the case when δ = 0 we have τ 0 C = τC . If C is a singleton consisting only of state i then we write τ δ i for τ δ C and equivalently τ i for τ C . It’s worth noting that finite mean return times Ei[τ + i ] < ∞ guarantee ergodicity or the existence of stationary probability and the convergence P (i, j)− π → 0 Bulletin of Mathematical Sciences and Applications Vol. 17 41

查看原文本刊更多论文

关于Wasserstein度量中遍历马尔可夫链的次几何速率收敛的一个注记

当d是平凡度量d0(i, j) = 1i i =j时，则关联的wasserstein度量为总变差度量wd0 (μ， ν) = dTV (μ， ν):= 2 supC∈B(X)| μ(C)−ν(C)|， μ， ν∈P(X)。如果存在一个常数ε > 0，使得对于所有i, j∈C，则1 2 dTV (P (i，·)，P (j，·))≤1 - ε，则称集合C是小的。集合C∈B(X)是小集，如果在B(X)上存在某个非平凡测度νa，并且存在某个概率分布a = {an: n∈Z+}使得∞∑n=1且p (X，·)≥νa(·)，∀X∈C。(6)小集泛化小集。延时常数δ > 0的小集合C上的第一次命中时间由τ δ C = inf{n≥δ: Φn∈C}给出。我们也有τ C = inf{n≥J1: Φn∈C}作为进程第一次跳跃J1后在集合C上的第一次命中时间。我们注意到，如果Φ0 /∈C， ξC = ξC。在δ = 0的情况下，我们有τ 0 C = τC。如果C是单态，只由状态i组成那么我们把τ δ i写成τ δ C等价地把τ i写成τ C。值得注意的是，有限平均返回时间Ei[τ + i] <∞保证了平稳概率的遍历性和P (i, j)−π→0的收敛性。数学科学与应用Vol. 17 - 41

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Bulletin of Mathematical Sciences and Applications

自引率

0.00%

发文量