从字母到单词再到单词:平稳测度的可逆编码

IF 2.2 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Łukasz Dębowski
{"title":"从字母到单词再到单词:平稳测度的可逆编码","authors":"Łukasz Dębowski","doi":"10.1109/TIT.2025.3562063","DOIUrl":null,"url":null,"abstract":"Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters and words. We introduce an invertible mapping between such measures, called the normalized transport, that preserves both stationarity and ergodicity. The normalized transport applies so called self-avoiding codes that generalize comma-separated codes and specialize bijective stationary codes. The normalized transport is also connected to the usual measure transport via underlying asymptotically mean stationary measures. It preserves the ergodic decomposition. The normalized transport and self-avoiding codes arise in the problem of successive recurrence times. In particular, we show that successive recurrence times are ergodic for an ergodic measure, which strengthens a result by Chen Moy from 1959. We also relate the entropy rates of processes linked by the normalized transport.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 6","pages":"4306-4316"},"PeriodicalIF":2.2000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"From Letters to Words and Back: Invertible Coding of Stationary Measures\",\"authors\":\"Łukasz Dębowski\",\"doi\":\"10.1109/TIT.2025.3562063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters and words. We introduce an invertible mapping between such measures, called the normalized transport, that preserves both stationarity and ergodicity. The normalized transport applies so called self-avoiding codes that generalize comma-separated codes and specialize bijective stationary codes. The normalized transport is also connected to the usual measure transport via underlying asymptotically mean stationary measures. It preserves the ergodic decomposition. The normalized transport and self-avoiding codes arise in the problem of successive recurrence times. In particular, we show that successive recurrence times are ergodic for an ergodic measure, which strengthens a result by Chen Moy from 1959. We also relate the entropy rates of processes linked by the normalized transport.\",\"PeriodicalId\":13494,\"journal\":{\"name\":\"IEEE Transactions on Information Theory\",\"volume\":\"71 6\",\"pages\":\"4306-4316\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2025-04-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Information Theory\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10969106/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Information Theory","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10969106/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

受统计语言建模问题的启发,我们考虑了两个不同基数的可计数字母(如字母和单词)上的无限序列的概率度量。我们在这些措施之间引入一种可逆映射,称为归一化传输,它既保持平稳又保持遍历性。规范化传输应用所谓的自避免码,它泛化逗号分隔码并专门化双目标平稳码。归一化输运也通过潜在的渐近平均平稳测度与通常的测量输运相联系。它保留了遍历分解。归一化传输码和自回避码会产生连续重复时间的问题。特别地,我们证明了一个遍历测度的连续重复次数是遍历的,这加强了1959年Chen Moy的结论。我们还将由归一化传输连接的过程的熵率联系起来。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
From Letters to Words and Back: Invertible Coding of Stationary Measures
Motivated by problems of statistical language modeling, we consider probability measures on infinite sequences over two countable alphabets of a different cardinality, such as letters and words. We introduce an invertible mapping between such measures, called the normalized transport, that preserves both stationarity and ergodicity. The normalized transport applies so called self-avoiding codes that generalize comma-separated codes and specialize bijective stationary codes. The normalized transport is also connected to the usual measure transport via underlying asymptotically mean stationary measures. It preserves the ergodic decomposition. The normalized transport and self-avoiding codes arise in the problem of successive recurrence times. In particular, we show that successive recurrence times are ergodic for an ergodic measure, which strengthens a result by Chen Moy from 1959. We also relate the entropy rates of processes linked by the normalized transport.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory 工程技术-工程:电子与电气
CiteScore
5.70
自引率
20.00%
发文量
514
审稿时长
12 months
期刊介绍: The IEEE Transactions on Information Theory is a journal that publishes theoretical and experimental papers concerned with the transmission, processing, and utilization of information. The boundaries of acceptable subject matter are intentionally not sharply delimited. Rather, it is hoped that as the focus of research activity changes, a flexible policy will permit this Transactions to follow suit. Current appropriate topics are best reflected by recent Tables of Contents; they are summarized in the titles of editorial areas that appear on the inside front cover.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信