Pietro Caputo, Zongchen Chen, Yuzhou Gu, Yury Polyanskiy
{"title":"马尔可夫链中的熵收缩:半步、全步和连续时间","authors":"Pietro Caputo, Zongchen Chen, Yuzhou Gu, Yury Polyanskiy","doi":"arxiv-2409.07689","DOIUrl":null,"url":null,"abstract":"This paper considers the speed of convergence (mixing) of a finite Markov\nkernel $P$ with respect to the Kullback-Leibler divergence (entropy). Given a\nMarkov kernel one defines either a discrete-time Markov chain (with the\n$n$-step transition kernel given by the matrix power $P^n$) or a\ncontinuous-time Markov process (with the time-$t$ transition kernel given by\n$e^{t(P-\\mathrm{Id})}$). The contraction of entropy for $n=1$ or $t=0+$ are\ncharacterized by the famous functional inequalities, the strong data processing\ninequality (SDPI) and the modified log-Sobolev inequality (MLSI), respectively.\nWhen $P=KK^*$ is written as the product of a kernel and its adjoint, one could\nalso consider the ``half-step'' contraction, which is the SDPI for $K$, while\nthe ``full-step'' contraction refers to the SDPI for $P$. The work [DMLM03]\nclaimed that these contraction coefficients (half-step, full-step, and\ncontinuous-time) are generally within a constant factor of each other. We\ndisprove this and related conjectures by working out a number of different\ncounterexamples. In particular, we construct (a) a continuous-time Markov\nprocess that contracts arbitrarily faster than its discrete-time counterpart;\nand (b) a kernel $P$ such that $P^{m+1}$ contracts arbitrarily better than\n$P^m$. Hence, our main conclusion is that the four standard inequalities\ncomparing five common notions of entropy and variance contraction are generally\nnot improvable. In the process of analyzing the counterexamples, we survey and sharpen the\ntools for bounding the contraction coefficients and characterize properties of\nextremizers of the respective functional inequalities. As our examples range\nfrom Bernoulli-Laplace model, random walks on graphs, to birth-death chains,\nthe paper is also intended as a tutorial on computing MLSI, SDPI and other\nconstants for these types of commonly occurring Markov chains.","PeriodicalId":501245,"journal":{"name":"arXiv - MATH - Probability","volume":"106 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Entropy Contractions in Markov Chains: Half-Step, Full-Step and Continuous-Time\",\"authors\":\"Pietro Caputo, Zongchen Chen, Yuzhou Gu, Yury Polyanskiy\",\"doi\":\"arxiv-2409.07689\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper considers the speed of convergence (mixing) of a finite Markov\\nkernel $P$ with respect to the Kullback-Leibler divergence (entropy). Given a\\nMarkov kernel one defines either a discrete-time Markov chain (with the\\n$n$-step transition kernel given by the matrix power $P^n$) or a\\ncontinuous-time Markov process (with the time-$t$ transition kernel given by\\n$e^{t(P-\\\\mathrm{Id})}$). The contraction of entropy for $n=1$ or $t=0+$ are\\ncharacterized by the famous functional inequalities, the strong data processing\\ninequality (SDPI) and the modified log-Sobolev inequality (MLSI), respectively.\\nWhen $P=KK^*$ is written as the product of a kernel and its adjoint, one could\\nalso consider the ``half-step'' contraction, which is the SDPI for $K$, while\\nthe ``full-step'' contraction refers to the SDPI for $P$. The work [DMLM03]\\nclaimed that these contraction coefficients (half-step, full-step, and\\ncontinuous-time) are generally within a constant factor of each other. We\\ndisprove this and related conjectures by working out a number of different\\ncounterexamples. In particular, we construct (a) a continuous-time Markov\\nprocess that contracts arbitrarily faster than its discrete-time counterpart;\\nand (b) a kernel $P$ such that $P^{m+1}$ contracts arbitrarily better than\\n$P^m$. Hence, our main conclusion is that the four standard inequalities\\ncomparing five common notions of entropy and variance contraction are generally\\nnot improvable. In the process of analyzing the counterexamples, we survey and sharpen the\\ntools for bounding the contraction coefficients and characterize properties of\\nextremizers of the respective functional inequalities. As our examples range\\nfrom Bernoulli-Laplace model, random walks on graphs, to birth-death chains,\\nthe paper is also intended as a tutorial on computing MLSI, SDPI and other\\nconstants for these types of commonly occurring Markov chains.\",\"PeriodicalId\":501245,\"journal\":{\"name\":\"arXiv - MATH - Probability\",\"volume\":\"106 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - MATH - Probability\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.07689\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - MATH - Probability","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.07689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Entropy Contractions in Markov Chains: Half-Step, Full-Step and Continuous-Time
This paper considers the speed of convergence (mixing) of a finite Markov
kernel $P$ with respect to the Kullback-Leibler divergence (entropy). Given a
Markov kernel one defines either a discrete-time Markov chain (with the
$n$-step transition kernel given by the matrix power $P^n$) or a
continuous-time Markov process (with the time-$t$ transition kernel given by
$e^{t(P-\mathrm{Id})}$). The contraction of entropy for $n=1$ or $t=0+$ are
characterized by the famous functional inequalities, the strong data processing
inequality (SDPI) and the modified log-Sobolev inequality (MLSI), respectively.
When $P=KK^*$ is written as the product of a kernel and its adjoint, one could
also consider the ``half-step'' contraction, which is the SDPI for $K$, while
the ``full-step'' contraction refers to the SDPI for $P$. The work [DMLM03]
claimed that these contraction coefficients (half-step, full-step, and
continuous-time) are generally within a constant factor of each other. We
disprove this and related conjectures by working out a number of different
counterexamples. In particular, we construct (a) a continuous-time Markov
process that contracts arbitrarily faster than its discrete-time counterpart;
and (b) a kernel $P$ such that $P^{m+1}$ contracts arbitrarily better than
$P^m$. Hence, our main conclusion is that the four standard inequalities
comparing five common notions of entropy and variance contraction are generally
not improvable. In the process of analyzing the counterexamples, we survey and sharpen the
tools for bounding the contraction coefficients and characterize properties of
extremizers of the respective functional inequalities. As our examples range
from Bernoulli-Laplace model, random walks on graphs, to birth-death chains,
the paper is also intended as a tutorial on computing MLSI, SDPI and other
constants for these types of commonly occurring Markov chains.