{"title":"The Existence of Plotkin-Optimal Linear Codes Over ℤ4","authors":"Hopein Christofen Tang","doi":"10.1109/TIT.2025.3582936","DOIUrl":"https://doi.org/10.1109/TIT.2025.3582936","url":null,"abstract":"We generalize the Plotkin-type Lee distance bound for linear codes over <inline-formula> <tex-math>$mathbb {Z}_{4}$ </tex-math></inline-formula> to several new and stronger bounds. We apply these bounds to determine all possible integers <italic>n</i> such that Plotkin-optimal linear codes over <inline-formula> <tex-math>$mathbb {Z}_{4}$ </tex-math></inline-formula> of length <italic>n</i> and type <inline-formula> <tex-math>$4^{k_{1}}2^{k_{2}}$ </tex-math></inline-formula> exist for any given non-negative integers <inline-formula> <tex-math>$k_{1}$ </tex-math></inline-formula> and <inline-formula> <tex-math>$k_{2}$ </tex-math></inline-formula>. We furthermore provide construction methods for Plotkin-optimal linear codes over <inline-formula> <tex-math>$mathbb {Z}_{4}$ </tex-math></inline-formula> for each possible length mentioned above. Our results are in large part established by considering column multiplicities of generator matrices.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"6712-6726"},"PeriodicalIF":2.9,"publicationDate":"2025-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identification Over Permutation Channels","authors":"Abhishek Sarkar;Bikash Kumar Dey","doi":"10.1109/TIT.2025.3582862","DOIUrl":"https://doi.org/10.1109/TIT.2025.3582862","url":null,"abstract":"We study message identification over a noiseless <italic>q</i>-ary uniform permutation channel, where the transmitted vector is permuted by a permutation chosen uniformly at random. The channel is noiseless in the sense that the channel only rearranges the symbols in a different order, without changing the symbol values. For discrete memoryless channels (DMCs), the number of identifiable messages grows doubly exponentially. Identification capacity, the maximum second-order exponent, is known to be the same as the Shannon capacity of the DMC. Permutation channels support reliable communication of only polynomially many messages. A simple achievability result shows that message sizes growing as <inline-formula> <tex-math>$2^{epsilon _{n}n^{q-1}}$ </tex-math></inline-formula> are identifiable for any <inline-formula> <tex-math>$epsilon _{n}rightarrow 0$ </tex-math></inline-formula>. We prove two converse results. A “soft” converse shows that for any <inline-formula> <tex-math>$Rgt 0$ </tex-math></inline-formula>, there is no sequence of identification codes with message size growing as <inline-formula> <tex-math>$2^{Rn^{q-1}}$ </tex-math></inline-formula> with a power-law decay (<inline-formula> <tex-math>$n^{-mu }$ </tex-math></inline-formula>) of the error probability. We also prove a “strong” converse showing that for any sequence of identification codes with message size <inline-formula> <tex-math>$2^{R_{n} n^{q-1}}$ </tex-math></inline-formula>, where <inline-formula> <tex-math>$R_{n} rightarrow infty $ </tex-math></inline-formula>, the sum of Type I and Type II error probabilities approaches at least 1 as <inline-formula> <tex-math>$nrightarrow infty $ </tex-math></inline-formula>. To prove the soft converse, we use a sequence of steps to construct a new identification code with a simpler structure which relates to a set system, and then use a lower bound on the normalized maximum pairwise intersection of a set system. To prove the strong converse, we use results on approximation of distributions. The achievability and converse results are generalized to the case of coding over multiple blocks. We also show that under deterministic encoding, the number of messages that can be identified per block is the number of types, i.e., <inline-formula> <tex-math>$binom {n+q-1}{q-1}$ </tex-math></inline-formula>, and this is same as the message size for reliable communication. We finally study message identification over a <italic>q</i>-ary uniform permutation channel in the presence of causal block-wise feedback from the receiver, where the encoder receives an entire <italic>n</i>-length received block after the transmission of the block is complete. We show that in the presence of feedback, the maximum number of identifiable messages grows doubly exponentially even under deterministic encoding, and we present a two-phase achievability scheme.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"6668-6691"},"PeriodicalIF":2.9,"publicationDate":"2025-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892393","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Conditional Entropies of k-Deletion/Insertion Channels","authors":"Shubhransh Singhvi;Omer Sabary;Daniella Bar-Lev;Eitan Yaakobi","doi":"10.1109/TIT.2025.3581849","DOIUrl":"https://doi.org/10.1109/TIT.2025.3581849","url":null,"abstract":"The channel output entropy of a transmitted sequence is the entropy of the possible channel outputs, and similarly, the channel input entropy of a received sequence is the entropy of all possible transmitted sequences. The goal of this work is to study these entropy values for the <italic>k</i>-deletion and <italic>k</i>-insertion channels, where exactly <italic>k</i> symbols are deleted or inserted in the transmitted sequence, respectively. If all possible sequences are transmitted with the same probability, then studying the input and output entropies becomes equivalent. For both the 1-deletion and 1-insertion channels, it is shown that among all sequences with a fixed number of runs, the input entropy is minimized for sequences with a skewed distribution of run lengths, and it is maximized for sequences with a balanced distribution of run lengths. Among our results, we establish a conjecture by Atashpendar et al., which claims that for the 1-deletion channel, the input entropy is maximized by the alternating sequences among all binary sequences. This conjecture is also verified for the 2-deletion channel, where it is proved that sequences with a single run minimize the input entropy.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"6503-6516"},"PeriodicalIF":2.9,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892380","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuai Yuan;Liuquan Yao;Yuan Li;Huazi Zhang;Jun Wang;Wen Tong;Zhiming Ma
{"title":"Achieving the Fundamental Limit of Lossless Analog Compression via Polarization","authors":"Shuai Yuan;Liuquan Yao;Yuan Li;Huazi Zhang;Jun Wang;Wen Tong;Zhiming Ma","doi":"10.1109/TIT.2025.3582110","DOIUrl":"https://doi.org/10.1109/TIT.2025.3582110","url":null,"abstract":"In this paper, we study the lossless analog compression for <italic>i.i.d.</i> discrete-continuous mixed signals via the polarization-based framework. We prove that for discrete-continuous mixed source, the error probability of maximum a posteriori (MAP) estimation polarizes under the Hadamard transform, which extends the polarization phenomenon to analog domain. Building on this insight, we propose the partial Hadamard compression and develop the corresponding analog successive cancellation (SC) decoder. The proposed scheme consists of deterministic measurement matrices and non-iterative reconstruction algorithm, providing benefits in both space and computational complexity. Using the polarization of error probability, we prove that our approach achieves the information-theoretical limit for lossless analog compression developed by Wu and Verdú.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"7367-7395"},"PeriodicalIF":2.9,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Theoretical Guarantees for Sparse Principal Component Analysis Based on the Elastic Net","authors":"Haoyi Yang;Teng Zhang;Lingzhou Xue","doi":"10.1109/TIT.2025.3582247","DOIUrl":"https://doi.org/10.1109/TIT.2025.3582247","url":null,"abstract":"Sparse principal component analysis (SPCA) is widely used for dimensionality reduction and feature extraction in high-dimensional data analysis. Despite many methodological and theoretical developments in the past two decades, the theoretical guarantees of the popular SPCA algorithm proposed by Zou et al. (2006) based on the elastic net are still unknown. This paper aims to address this critical theoretical gap. We first revisit the SPCA algorithm of Zou et al. (2006) and present our implementation. We also study a computationally more efficient variant of the SPCA algorithm in Zou et al. (2006) that can be considered as the limiting case of SPCA. We provide the guarantees of convergence to a stationary point for both algorithms and prove that, under a sparse spiked covariance model, both algorithms can recover the principal subspace consistently under mild regularity conditions. We show that their estimation error bounds match the best available bounds of existing works or the minimax rates up to some logarithmic factors. Moreover, we demonstrate the competitive numerical performance of both algorithms in numerical experiments.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"7149-7175"},"PeriodicalIF":2.9,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144891180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"New Bounds of Linear Matrix Codes for the Rosenbloom-Tsfasman Metric and Optimal Constructions","authors":"Xinran Wang;Chengju Li;Ziling Heng","doi":"10.1109/TIT.2025.3581994","DOIUrl":"https://doi.org/10.1109/TIT.2025.3581994","url":null,"abstract":"The Rosenbloom-Tsfasman metric (RT-metric for short) is a generalization of the Hamming metric. Matrix codes in the frame of the RT-metric have been used in information transmission over parallel channels. In this paper, we develop some new upper bounds on the minimum RT-distance of an <inline-formula> <tex-math>$[h times n, k, d_{mathrm {RT}}]$ </tex-math></inline-formula> linear matrix code, which generalize the Singleton-type bound derived by Rosenbloom and Tsfasman. It should be emphasized that the upper bounds build a connection between the RT-metric and the Hamming metric. Constructions of linear matrix codes are presented and their parameters for the RT-metric are investigated. It is shown that every linear matrix code can be expressed by using the trace function, which is a generalization of the well-known defining-set construction of linear codes. Moreover, we obtain several classes of optimal linear matrix codes in this paper.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"6844-6856"},"PeriodicalIF":2.9,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892344","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exact Error Exponents of Concatenated Codes for DNA Storage","authors":"Yan Hao Ling;Jonathan Scarlett","doi":"10.1109/TIT.2025.3581559","DOIUrl":"https://doi.org/10.1109/TIT.2025.3581559","url":null,"abstract":"In this paper, we consider a concatenated coding based class of DNA storage codes in which the selected molecules are constrained to be taken from an “inner” codebook associated with the sequencing channel. This codebook is used in a “closed-box” manner, and is only assumed to operate at an achievable rate in the sense of attaining asymptotically vanishing maximal (inner) error probability. We first derive the exact error exponent in a widely-studied regime of constant rate and a linear number of sequencing reads, and show strict improvements over an existing achievable error exponent. Moreover, our achievability analysis is based on a coded-index strategy, implying that such strategies attain the highest error exponents within the broader class of codes that we consider. We then extend our results to other scaling regimes, including a super-linear number of reads, as well as several low-rate regimes. We find that the latter comes with notable intricacies, such as dependencies of the error exponents on the model for sequencing errors.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"6566-6585"},"PeriodicalIF":2.9,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892394","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Weakly Private Information Retrieval From Heterogeneously Trusted Servers","authors":"Wenyuan Zhao;Yu-Shin Huang;Ruida Zhou;Chao Tian","doi":"10.1109/TIT.2025.3581850","DOIUrl":"https://doi.org/10.1109/TIT.2025.3581850","url":null,"abstract":"We study the problem of weakly private information retrieval (PIR) when there is heterogeneity in servers’ trustworthiness under the maximal leakage (Max-L) metric and mutual information (MI) metric. A user wishes to retrieve a desired message from N non-colluding servers efficiently, such that the identity of the desired message is not leaked in a significant manner; however, some servers can be more trustworthy than others. We propose a code construction for this setting and optimize the probability distribution for this construction. For the Max-L metric, it is shown that the optimal probability allocation for the proposed scheme essentially separates the delivery patterns into two parts: a completely private part that has the same download overhead as the capacity-achieving PIR code, and a non-private part that allows complete privacy leakage but has no download overhead by downloading only from the most trustful server. The optimal solution is established through a sophisticated analysis of the underlying convex optimization problem and a reduction between the homogeneous setting and the heterogeneous setting. For the MI metric, the homogeneous case is studied first for which the code can be optimized with an explicit probability assignment, while a closed-form solution becomes intractable for the heterogeneous case. Numerical results are provided for both cases to corroborate the theoretical analysis.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"7292-7309"},"PeriodicalIF":2.9,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144896843","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ε-MSR Codes for Any Set of Helper Nodes","authors":"Vinayak Ramkumar;Netanel Raviv;Itzhak Tamo","doi":"10.1109/TIT.2025.3582186","DOIUrl":"https://doi.org/10.1109/TIT.2025.3582186","url":null,"abstract":"Minimum storage regenerating (MSR) codes are a class of maximum distance separable (MDS) array codes capable of repairing any single failed node by downloading the minimum amount of information from each of the helper nodes. However, MSR codes require large sub-packetization levels, which hinders their usefulness in practical settings. This led to the development of another class of MDS array codes called <inline-formula> <tex-math>$varepsilon $ </tex-math></inline-formula>-MSR codes, for which the repair information downloaded from each helper node is at most a factor of <inline-formula> <tex-math>$(1+varepsilon)$ </tex-math></inline-formula> from the minimum amount for some <inline-formula> <tex-math>$varepsilon gt 0$ </tex-math></inline-formula>. The advantage of <inline-formula> <tex-math>$varepsilon $ </tex-math></inline-formula>-MSR codes over MSR codes is their small sub-packetization levels. In previous constructions of epsilon-MSR codes, however, several specific nodes are required to participate in the repair of a failed node, which limits the performance of the code in cases where these nodes are not available. In this work, we present a construction of <inline-formula> <tex-math>$varepsilon $ </tex-math></inline-formula>-MSR codes without this restriction. For a code with <italic>n</i> nodes, out of which <italic>k</i> store uncoded information, and for any number <italic>d</i> of helper nodes (<inline-formula> <tex-math>$kle dlt n$ </tex-math></inline-formula>), the repair of a failed node can be done by contacting any set of <italic>d</i> surviving nodes. Our construction utilizes group algebra techniques, and requires linear field size. We also generalize the construction to MDS array codes capable of repairing <italic>h</i> failed nodes using <italic>d</i> helper nodes with a slightly sub-optimal download from each helper node, for all <inline-formula> <tex-math>$h le n-k$ </tex-math></inline-formula> and <inline-formula> <tex-math>$k le d le n-h$ </tex-math></inline-formula> simultaneously.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 9","pages":"6657-6667"},"PeriodicalIF":2.9,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144892401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xuetong Wu;Jonathan H. Manton;Uwe Aickelin;Jingge Zhu
{"title":"Fast Rate Information-Theoretic Bounds on Generalization Errors","authors":"Xuetong Wu;Jonathan H. Manton;Uwe Aickelin;Jingge Zhu","doi":"10.1109/TIT.2025.3581715","DOIUrl":"https://doi.org/10.1109/TIT.2025.3581715","url":null,"abstract":"The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by (Bu et al., 2020), which itself is a tightened version of the first bound on the topic by (Russo and Zou, 2016) and (Xu and Raginsky, 2017), this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size <italic>n</i>. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as <inline-formula> <tex-math>$O(sqrt {1/n})$ </tex-math></inline-formula> while the true generalization error scales as <inline-formula> <tex-math>$O(1/n)$ </tex-math></inline-formula>. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the <inline-formula> <tex-math>$(eta, c)$ </tex-math></inline-formula>-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 8","pages":"6373-6392"},"PeriodicalIF":2.2,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144695538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}