{"title":"关于Wasserstein度量中遍历马尔可夫链的次几何速率收敛的一个注记","authors":"Mokaedi V. Lekgari","doi":"10.18052/WWW.SCIPRESS.COM/BMSA.17.40","DOIUrl":null,"url":null,"abstract":"We investigate subgeometric rate ergodicity for Markov chains in the Wasserstein metric and show that the finiteness of the expectation E(i,j)[ ∑τ△−1 k=0 r(k)], where τ△ is the hitting time on the coupling set △ and r is a subgeometric rate function, is equivalent to a sequence of Foster-Lyapunov drift conditions which imply subgeometric convergence in the Wassertein distance. We give an example for a ’family of nested drift conditions’. Introduction and Notations We start with a brief review of ergodicity. Let Z+ = {0, 1, 2, ...}, N+ = {1, 2, ...}, and R+ = [0,∞). Let (Φn)n∈Z+ denote a Markov chain with transition kernel P on a countably generated state space denoted by (X ,B(X )). P (i, j) = Pi(Φn=j) = Ei[1Φn=j ], where Pi and Ei respectively denote the probability and expectation of the chain under the condition that its initial state Φ0 = i, and 1A is the indicator function of set A. According to Markov’s theorem, a Markov chain (Φn)n∈Z+ is ergodic if there’s positive probability to pass from any state, say i ∈ X to any other state, say · ∈ X in one step. That is, for states i, · ∈ X then chain (Φn)n∈Z+ is ergodic if P (i, ·) > 0. Also the chain (Φn)n∈Z+ is said to be (ordinary) ergodic if ∀ i, · ∈ X then P (i, ·) → π(·) as n → ∞, where the σ-finite measure π is the invariant limit distribution of the chain. Chain (Φn)n∈Z+ is referred to as geometrically ergodic if there exists some measurable function V : X → (0,∞), and constants β < 1 andM < ∞ such that ||P (i, ·)− π(·)|| ≤ MV (i)β, ∀ n ∈ N+, where here and hereafter for the (signed) measure μwe define μ(f) = ∫ μ(dj)f(j), and the norm ||μ|| is defined by sup|g|≤f |μ(g)|, whereas the total variation norm is defined similarly but with f ≡ 1. Markov chain (Φn)n∈Z+ is strongly ergodic if lim n→∞ sup i∈X ||P (i, ·)− π(·)|| = 0. Loosely speaking subgeometric ergodicity, which we define next, is a kind of convergence that’s faster than ordinary ergodicity but slower than geometric ergodicity. Let function r ∈ Λ0 where Λ0 is the family of measurable increasing functions r : R+ → [1,∞) satisfying log r(t) t ↓ 0 as t ↑ ∞. Let Λ denote the class of positive functions r : R+ → (0,∞) such that for some r ∈ Λ0 we have; 0 < lim n inf r(n) r(n) ≤ lim n sup r(n) r(n) < ∞. (1) Indeed (1) implies the equivalence of the class of functionsΛ0 with the class of functions Λ. Examples of functions in the class r ∈ Λ is the rate r(n) = exp(sn), α > 0, s > 0. Without loss to Bulletin of Mathematical Sciences and Applications Submitted: 2016-08-30 ISSN: 2278-9634, Vol. 17, pp 40-45 Revised: 2016-10-10 doi:10.18052/www.scipress.com/BMSA.17.40 Accepted: 2016-10-17 2016 SciPress Ltd, Switzerland Online: 2016-11-01 SciPress applies the CC-BY 4.0 license to works we publish: https://creativecommons.org/licenses/by/4.0/ generality we suppose that r(0) = 1 whenever r ∈ Λ. The properties of r ∈ Λ0 which follow from (1) and are to be used frequently in this study are; r(x+ y) ≤ r(x)r(y) ∀ x, y ∈ R+ (2) r(x+ a) r(x) → 1 as x → ∞, for each a ∈ R+. (3) Λ is referred to as the class of subgeometric rate functions(cf. [3]). Let r ∈ Λ, then the ergodic chain Φn is said to be subgeometrically ergodic of order r in the f norm, (or simply (f, r)-ergodic) if for the unique invariant distribution π of the process and ∀ i ∈ X , then lim n→+∞ r(n)||P (i, ·)− π(·)||f = 0, (4) where ||σ||f = sup|g|≤f |σ(g)| and f : X → [1,∞) is a measurable function. Also for subgeometric ergodic to hold it’s necessary that there exist a deterministic sequence {Vn} of functions Vn : X → [1,∞) which satisfy the Foster-Lyapunov drift condition: PVn+1 ≤ Vn − r(n)f + br(n)1C , n ∈ Z+. (5) for a petite set C ∈ B(X ) and a constant b ∈ R+ such that supC V0 < ∞. The Foster-Lyapunov drift conditions provide bounds on the return time to accessible sets thereby availing some control on the Markov process dynamics by focusing on the hitting times on a particular set. Convergence in the Wasserstein distance is a very interesting research area through which [1] amongst other authors suggested a new technique for establishing subgeometric ergodicity. Following [1] we define the Wasserstein distance as follows. Let (X , d) be a Polish space where d is a distance bounded by 1 and let P(X ) denote the set of all probability measures on state space (X ,B(X )). Let μ, ν ∈ P(X ); λ is a coupling of μ and ν if λ is a probability on the product space (X ×X ,B(X ×X )), such that λ(A×X ) = μ(A) and λ(X ×A) = ν(A) ∀ A ∈ B(X ). We further let C(μ, ν) be set of all probability measures on (X ×X ,B(X ×X )) with marginals μ and ν, and Q be the coupling Markov kernel on (X × X ,B(X × X )) such that for every i, j ∈ X , then Q((i, j), ·) is a coupling of P (i, ·) and P (j, ·). The Wasserstein metric associated with the semimetric d on X , between two probability measures μ and ν, is then given as Wd(μ, ν) := inf γ∈C(μ,ν) ∫ X×X d(i, j)dγ(i, j). When d is the trivial metric d0(i, j) = 1i ̸=j , then the associatedWasserstein metric is the total variation metricWd0(μ, ν) = dTV (μ, ν) := 2 supC∈B(X ) |μ(C)− ν(C)|, μ, ν ∈ P(X ). A set C is said to be small if there exists a constant ε > 0 such that for all i, j ∈ C then 1 2 dTV (P (i, ·), P (j, ·)) ≤ 1 − ε. Set C ∈ B(X ) is petite if there exist some non-trivial measure νa on B(X ) and some probability distribution a = {an : n ∈ Z+} such that ∞ ∑ n=1 anP (x, ·) ≥ νa(·), ∀ x ∈ C. (6) Petite sets generalize small sets. The first hitting time on small set C delayed by a constant δ > 0 is given by τ δ C = inf{n ≥ δ : Φn ∈ C}. We also have τ C = inf{n ≥ J1 : Φn ∈ C} as the first hitting time on the set C after the first jump J1 of the process. We note that ξ C = ξC if Φ0 / ∈ C. In the case when δ = 0 we have τ 0 C = τC . If C is a singleton consisting only of state i then we write τ δ i for τ δ C and equivalently τ i for τ C . It’s worth noting that finite mean return times Ei[τ + i ] < ∞ guarantee ergodicity or the existence of stationary probability and the convergence P (i, j)− π → 0 Bulletin of Mathematical Sciences and Applications Vol. 17 41","PeriodicalId":252632,"journal":{"name":"Bulletin of Mathematical Sciences and Applications","volume":"197 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Note on Subgeometric Rate Convergence for Ergodic Markov Chains in the Wasserstein Metric\",\"authors\":\"Mokaedi V. Lekgari\",\"doi\":\"10.18052/WWW.SCIPRESS.COM/BMSA.17.40\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We investigate subgeometric rate ergodicity for Markov chains in the Wasserstein metric and show that the finiteness of the expectation E(i,j)[ ∑τ△−1 k=0 r(k)], where τ△ is the hitting time on the coupling set △ and r is a subgeometric rate function, is equivalent to a sequence of Foster-Lyapunov drift conditions which imply subgeometric convergence in the Wassertein distance. We give an example for a ’family of nested drift conditions’. Introduction and Notations We start with a brief review of ergodicity. Let Z+ = {0, 1, 2, ...}, N+ = {1, 2, ...}, and R+ = [0,∞). Let (Φn)n∈Z+ denote a Markov chain with transition kernel P on a countably generated state space denoted by (X ,B(X )). P (i, j) = Pi(Φn=j) = Ei[1Φn=j ], where Pi and Ei respectively denote the probability and expectation of the chain under the condition that its initial state Φ0 = i, and 1A is the indicator function of set A. According to Markov’s theorem, a Markov chain (Φn)n∈Z+ is ergodic if there’s positive probability to pass from any state, say i ∈ X to any other state, say · ∈ X in one step. That is, for states i, · ∈ X then chain (Φn)n∈Z+ is ergodic if P (i, ·) > 0. Also the chain (Φn)n∈Z+ is said to be (ordinary) ergodic if ∀ i, · ∈ X then P (i, ·) → π(·) as n → ∞, where the σ-finite measure π is the invariant limit distribution of the chain. Chain (Φn)n∈Z+ is referred to as geometrically ergodic if there exists some measurable function V : X → (0,∞), and constants β < 1 andM < ∞ such that ||P (i, ·)− π(·)|| ≤ MV (i)β, ∀ n ∈ N+, where here and hereafter for the (signed) measure μwe define μ(f) = ∫ μ(dj)f(j), and the norm ||μ|| is defined by sup|g|≤f |μ(g)|, whereas the total variation norm is defined similarly but with f ≡ 1. Markov chain (Φn)n∈Z+ is strongly ergodic if lim n→∞ sup i∈X ||P (i, ·)− π(·)|| = 0. Loosely speaking subgeometric ergodicity, which we define next, is a kind of convergence that’s faster than ordinary ergodicity but slower than geometric ergodicity. Let function r ∈ Λ0 where Λ0 is the family of measurable increasing functions r : R+ → [1,∞) satisfying log r(t) t ↓ 0 as t ↑ ∞. Let Λ denote the class of positive functions r : R+ → (0,∞) such that for some r ∈ Λ0 we have; 0 < lim n inf r(n) r(n) ≤ lim n sup r(n) r(n) < ∞. (1) Indeed (1) implies the equivalence of the class of functionsΛ0 with the class of functions Λ. Examples of functions in the class r ∈ Λ is the rate r(n) = exp(sn), α > 0, s > 0. Without loss to Bulletin of Mathematical Sciences and Applications Submitted: 2016-08-30 ISSN: 2278-9634, Vol. 17, pp 40-45 Revised: 2016-10-10 doi:10.18052/www.scipress.com/BMSA.17.40 Accepted: 2016-10-17 2016 SciPress Ltd, Switzerland Online: 2016-11-01 SciPress applies the CC-BY 4.0 license to works we publish: https://creativecommons.org/licenses/by/4.0/ generality we suppose that r(0) = 1 whenever r ∈ Λ. The properties of r ∈ Λ0 which follow from (1) and are to be used frequently in this study are; r(x+ y) ≤ r(x)r(y) ∀ x, y ∈ R+ (2) r(x+ a) r(x) → 1 as x → ∞, for each a ∈ R+. (3) Λ is referred to as the class of subgeometric rate functions(cf. [3]). Let r ∈ Λ, then the ergodic chain Φn is said to be subgeometrically ergodic of order r in the f norm, (or simply (f, r)-ergodic) if for the unique invariant distribution π of the process and ∀ i ∈ X , then lim n→+∞ r(n)||P (i, ·)− π(·)||f = 0, (4) where ||σ||f = sup|g|≤f |σ(g)| and f : X → [1,∞) is a measurable function. Also for subgeometric ergodic to hold it’s necessary that there exist a deterministic sequence {Vn} of functions Vn : X → [1,∞) which satisfy the Foster-Lyapunov drift condition: PVn+1 ≤ Vn − r(n)f + br(n)1C , n ∈ Z+. (5) for a petite set C ∈ B(X ) and a constant b ∈ R+ such that supC V0 < ∞. The Foster-Lyapunov drift conditions provide bounds on the return time to accessible sets thereby availing some control on the Markov process dynamics by focusing on the hitting times on a particular set. Convergence in the Wasserstein distance is a very interesting research area through which [1] amongst other authors suggested a new technique for establishing subgeometric ergodicity. Following [1] we define the Wasserstein distance as follows. Let (X , d) be a Polish space where d is a distance bounded by 1 and let P(X ) denote the set of all probability measures on state space (X ,B(X )). Let μ, ν ∈ P(X ); λ is a coupling of μ and ν if λ is a probability on the product space (X ×X ,B(X ×X )), such that λ(A×X ) = μ(A) and λ(X ×A) = ν(A) ∀ A ∈ B(X ). We further let C(μ, ν) be set of all probability measures on (X ×X ,B(X ×X )) with marginals μ and ν, and Q be the coupling Markov kernel on (X × X ,B(X × X )) such that for every i, j ∈ X , then Q((i, j), ·) is a coupling of P (i, ·) and P (j, ·). The Wasserstein metric associated with the semimetric d on X , between two probability measures μ and ν, is then given as Wd(μ, ν) := inf γ∈C(μ,ν) ∫ X×X d(i, j)dγ(i, j). When d is the trivial metric d0(i, j) = 1i ̸=j , then the associatedWasserstein metric is the total variation metricWd0(μ, ν) = dTV (μ, ν) := 2 supC∈B(X ) |μ(C)− ν(C)|, μ, ν ∈ P(X ). A set C is said to be small if there exists a constant ε > 0 such that for all i, j ∈ C then 1 2 dTV (P (i, ·), P (j, ·)) ≤ 1 − ε. Set C ∈ B(X ) is petite if there exist some non-trivial measure νa on B(X ) and some probability distribution a = {an : n ∈ Z+} such that ∞ ∑ n=1 anP (x, ·) ≥ νa(·), ∀ x ∈ C. (6) Petite sets generalize small sets. The first hitting time on small set C delayed by a constant δ > 0 is given by τ δ C = inf{n ≥ δ : Φn ∈ C}. We also have τ C = inf{n ≥ J1 : Φn ∈ C} as the first hitting time on the set C after the first jump J1 of the process. We note that ξ C = ξC if Φ0 / ∈ C. In the case when δ = 0 we have τ 0 C = τC . If C is a singleton consisting only of state i then we write τ δ i for τ δ C and equivalently τ i for τ C . It’s worth noting that finite mean return times Ei[τ + i ] < ∞ guarantee ergodicity or the existence of stationary probability and the convergence P (i, j)− π → 0 Bulletin of Mathematical Sciences and Applications Vol. 17 41\",\"PeriodicalId\":252632,\"journal\":{\"name\":\"Bulletin of Mathematical Sciences and Applications\",\"volume\":\"197 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bulletin of Mathematical Sciences and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18052/WWW.SCIPRESS.COM/BMSA.17.40\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bulletin of Mathematical Sciences and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18052/WWW.SCIPRESS.COM/BMSA.17.40","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Note on Subgeometric Rate Convergence for Ergodic Markov Chains in the Wasserstein Metric
We investigate subgeometric rate ergodicity for Markov chains in the Wasserstein metric and show that the finiteness of the expectation E(i,j)[ ∑τ△−1 k=0 r(k)], where τ△ is the hitting time on the coupling set △ and r is a subgeometric rate function, is equivalent to a sequence of Foster-Lyapunov drift conditions which imply subgeometric convergence in the Wassertein distance. We give an example for a ’family of nested drift conditions’. Introduction and Notations We start with a brief review of ergodicity. Let Z+ = {0, 1, 2, ...}, N+ = {1, 2, ...}, and R+ = [0,∞). Let (Φn)n∈Z+ denote a Markov chain with transition kernel P on a countably generated state space denoted by (X ,B(X )). P (i, j) = Pi(Φn=j) = Ei[1Φn=j ], where Pi and Ei respectively denote the probability and expectation of the chain under the condition that its initial state Φ0 = i, and 1A is the indicator function of set A. According to Markov’s theorem, a Markov chain (Φn)n∈Z+ is ergodic if there’s positive probability to pass from any state, say i ∈ X to any other state, say · ∈ X in one step. That is, for states i, · ∈ X then chain (Φn)n∈Z+ is ergodic if P (i, ·) > 0. Also the chain (Φn)n∈Z+ is said to be (ordinary) ergodic if ∀ i, · ∈ X then P (i, ·) → π(·) as n → ∞, where the σ-finite measure π is the invariant limit distribution of the chain. Chain (Φn)n∈Z+ is referred to as geometrically ergodic if there exists some measurable function V : X → (0,∞), and constants β < 1 andM < ∞ such that ||P (i, ·)− π(·)|| ≤ MV (i)β, ∀ n ∈ N+, where here and hereafter for the (signed) measure μwe define μ(f) = ∫ μ(dj)f(j), and the norm ||μ|| is defined by sup|g|≤f |μ(g)|, whereas the total variation norm is defined similarly but with f ≡ 1. Markov chain (Φn)n∈Z+ is strongly ergodic if lim n→∞ sup i∈X ||P (i, ·)− π(·)|| = 0. Loosely speaking subgeometric ergodicity, which we define next, is a kind of convergence that’s faster than ordinary ergodicity but slower than geometric ergodicity. Let function r ∈ Λ0 where Λ0 is the family of measurable increasing functions r : R+ → [1,∞) satisfying log r(t) t ↓ 0 as t ↑ ∞. Let Λ denote the class of positive functions r : R+ → (0,∞) such that for some r ∈ Λ0 we have; 0 < lim n inf r(n) r(n) ≤ lim n sup r(n) r(n) < ∞. (1) Indeed (1) implies the equivalence of the class of functionsΛ0 with the class of functions Λ. Examples of functions in the class r ∈ Λ is the rate r(n) = exp(sn), α > 0, s > 0. Without loss to Bulletin of Mathematical Sciences and Applications Submitted: 2016-08-30 ISSN: 2278-9634, Vol. 17, pp 40-45 Revised: 2016-10-10 doi:10.18052/www.scipress.com/BMSA.17.40 Accepted: 2016-10-17 2016 SciPress Ltd, Switzerland Online: 2016-11-01 SciPress applies the CC-BY 4.0 license to works we publish: https://creativecommons.org/licenses/by/4.0/ generality we suppose that r(0) = 1 whenever r ∈ Λ. The properties of r ∈ Λ0 which follow from (1) and are to be used frequently in this study are; r(x+ y) ≤ r(x)r(y) ∀ x, y ∈ R+ (2) r(x+ a) r(x) → 1 as x → ∞, for each a ∈ R+. (3) Λ is referred to as the class of subgeometric rate functions(cf. [3]). Let r ∈ Λ, then the ergodic chain Φn is said to be subgeometrically ergodic of order r in the f norm, (or simply (f, r)-ergodic) if for the unique invariant distribution π of the process and ∀ i ∈ X , then lim n→+∞ r(n)||P (i, ·)− π(·)||f = 0, (4) where ||σ||f = sup|g|≤f |σ(g)| and f : X → [1,∞) is a measurable function. Also for subgeometric ergodic to hold it’s necessary that there exist a deterministic sequence {Vn} of functions Vn : X → [1,∞) which satisfy the Foster-Lyapunov drift condition: PVn+1 ≤ Vn − r(n)f + br(n)1C , n ∈ Z+. (5) for a petite set C ∈ B(X ) and a constant b ∈ R+ such that supC V0 < ∞. The Foster-Lyapunov drift conditions provide bounds on the return time to accessible sets thereby availing some control on the Markov process dynamics by focusing on the hitting times on a particular set. Convergence in the Wasserstein distance is a very interesting research area through which [1] amongst other authors suggested a new technique for establishing subgeometric ergodicity. Following [1] we define the Wasserstein distance as follows. Let (X , d) be a Polish space where d is a distance bounded by 1 and let P(X ) denote the set of all probability measures on state space (X ,B(X )). Let μ, ν ∈ P(X ); λ is a coupling of μ and ν if λ is a probability on the product space (X ×X ,B(X ×X )), such that λ(A×X ) = μ(A) and λ(X ×A) = ν(A) ∀ A ∈ B(X ). We further let C(μ, ν) be set of all probability measures on (X ×X ,B(X ×X )) with marginals μ and ν, and Q be the coupling Markov kernel on (X × X ,B(X × X )) such that for every i, j ∈ X , then Q((i, j), ·) is a coupling of P (i, ·) and P (j, ·). The Wasserstein metric associated with the semimetric d on X , between two probability measures μ and ν, is then given as Wd(μ, ν) := inf γ∈C(μ,ν) ∫ X×X d(i, j)dγ(i, j). When d is the trivial metric d0(i, j) = 1i ̸=j , then the associatedWasserstein metric is the total variation metricWd0(μ, ν) = dTV (μ, ν) := 2 supC∈B(X ) |μ(C)− ν(C)|, μ, ν ∈ P(X ). A set C is said to be small if there exists a constant ε > 0 such that for all i, j ∈ C then 1 2 dTV (P (i, ·), P (j, ·)) ≤ 1 − ε. Set C ∈ B(X ) is petite if there exist some non-trivial measure νa on B(X ) and some probability distribution a = {an : n ∈ Z+} such that ∞ ∑ n=1 anP (x, ·) ≥ νa(·), ∀ x ∈ C. (6) Petite sets generalize small sets. The first hitting time on small set C delayed by a constant δ > 0 is given by τ δ C = inf{n ≥ δ : Φn ∈ C}. We also have τ C = inf{n ≥ J1 : Φn ∈ C} as the first hitting time on the set C after the first jump J1 of the process. We note that ξ C = ξC if Φ0 / ∈ C. In the case when δ = 0 we have τ 0 C = τC . If C is a singleton consisting only of state i then we write τ δ i for τ δ C and equivalently τ i for τ C . It’s worth noting that finite mean return times Ei[τ + i ] < ∞ guarantee ergodicity or the existence of stationary probability and the convergence P (i, j)− π → 0 Bulletin of Mathematical Sciences and Applications Vol. 17 41