Yifei Wang;Yixuan Hua;Emmanuel J. Candès;Mert Pilanci
{"title":"Overparameterized ReLU Neural Networks Learn the Simplest Model: Neural Isometry and Phase Transitions","authors":"Yifei Wang;Yixuan Hua;Emmanuel J. Candès;Mert Pilanci","doi":"10.1109/TIT.2025.3530355","DOIUrl":"https://doi.org/10.1109/TIT.2025.3530355","url":null,"abstract":"The practice of deep learning has shown that neural networks generalize remarkably well even with an extreme number of learned parameters. This appears to contradict traditional statistical wisdom, in which a trade-off between model complexity and fit to the data is essential. We aim to address this discrepancy by adopting a convex optimization and sparse recovery perspective. We consider the training and generalization properties of two-layer ReLU networks with standard weight decay regularization. Under certain regularity assumptions on the data, we show that ReLU networks with an arbitrary number of parameters learn only simple models that explain the data. This is analogous to the recovery of the sparsest linear model in compressed sensing. For ReLU networks and their variants with skip connections or normalization layers, we present isometry conditions that ensure the exact recovery of planted neurons. For randomly generated data, we show the existence of a phase transition in recovering planted neural network models, which is easy to describe: whenever the ratio between the number of samples and the dimension exceeds a numerical threshold, the recovery succeeds with high probability; otherwise, it fails with high probability. Surprisingly, ReLU networks learn simple and sparse models that generalize well even when the labels are noisy. The phase transition phenomenon is confirmed through numerical experiments.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1926-1977"},"PeriodicalIF":2.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Split-State Non-Malleable Codes and Secret Sharing Schemes for Quantum Messages","authors":"Naresh Goud Boddu;Vipul Goyal;Rahul Jain;João Ribeiro","doi":"10.1109/TIT.2025.3530385","DOIUrl":"https://doi.org/10.1109/TIT.2025.3530385","url":null,"abstract":"Non-malleable codes are fundamental objects at the intersection of cryptography and coding theory. These codes provide security guarantees even in settings where error correction and detection are impossible, and have found applications to several other cryptographic tasks. One of the strongest and most well-studied adversarial tampering models is 2-split-state tampering. Here, a codeword is split into two parts which are stored in physically distant servers, and the adversary can then independently tamper with each part using arbitrary functions. This model can be naturally extended to the secret sharing setting with several parties by having the adversary independently tamper with each share. Previous works on non-malleable coding and secret sharing in the split-state tampering model only considered the encoding of classical messages. Furthermore, until recent work by Aggarwal, Boddu, and Jain (IEEE Trans. Inf. Theory 2024 & arXiv 2022), adversaries with quantum capabilities and shared entanglement had not been considered, and it is a priori not clear whether previous schemes remain secure in this model. In this work, we introduce the notions of split-state non-malleable codes and secret sharing schemes for quantum messages secure against quantum adversaries with shared entanglement. Then, we present explicit constructions of such schemes that achieve low-error non-malleability. More precisely, for some constant <inline-formula> <tex-math>$cgt 0$ </tex-math></inline-formula>, we construct efficiently encodable and decodable split-state non-malleable codes and secret sharing schemes for quantum messages preserving entanglement with external systems and achieving security against quantum adversaries having shared entanglement with codeword length n, any message length at most <inline-formula> <tex-math>$n^{c}$ </tex-math></inline-formula>, and error <inline-formula> <tex-math>$varepsilon =2^{-{n^{c}}}$ </tex-math></inline-formula>. In the easier setting of average-case non-malleability, we achieve efficient non-malleable coding with rate close to <inline-formula> <tex-math>$1/11$ </tex-math></inline-formula>.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 4","pages":"2838-2871"},"PeriodicalIF":2.2,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667314","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sequences With Identical Autocorrelation Functions","authors":"Daniel J. Katz;Adeebur Rahman;Michael J Ward","doi":"10.1109/TIT.2025.3529639","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529639","url":null,"abstract":"Aperiodic autocorrelation is an important indicator of performance of sequences used in communications, remote sensing, and scientific instrumentation. Knowing a sequence’s autocorrelation function, which reports the autocorrelation at every possible translation, is equivalent to knowing the magnitude of the sequence’s Fourier transform. The phase problem is the difficulty in resolving this lack of phase information. We say that two sequences are equicorrelational to mean that they have the same aperiodic autocorrelation function. Sequences used in technological applications often have restrictions on their terms: they are not arbitrary complex numbers, but come from a more restricted alphabet. For example, binary sequences involve terms equal to only +1 and −1. We investigate the necessary and sufficient conditions for two sequences to be equicorrelational, where we take their alphabet into consideration. There are trivial forms of equicorrelationality arising from modifications that predictably preserve the autocorrelation, for example, negating a binary sequence or reversing the order of its terms. By a search of binary sequences up to length 44, we find that nontrivial equicorrelationality among binary sequences does occur, but is rare. An integer n is said to be equivocal when there are binary sequences of length n that are nontrivially equicorrelational; otherwise n is unequivocal. For <inline-formula> <tex-math>$n leq 44$ </tex-math></inline-formula>, we found that the unequivocal lengths are 1–8, 10, 11, 13, 14, 19, 22, 23, 26, 29, 37, and 38. We pose open questions about the finitude of unequivocal numbers and the probability of nontrivial equicorrelationality occurring among binary sequences.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 4","pages":"3194-3202"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Generalized Quantum Data-Syndrome Codes and Belief Propagation Decoding for Phenomenological Noise","authors":"Kao-Yueh Kuo;Ching-Yi Lai","doi":"10.1109/TIT.2025.3529773","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529773","url":null,"abstract":"Quantum stabilizer codes often struggle with syndrome errors due to measurement imperfections. Typically, multiple rounds of syndrome extraction are employed to ensure reliable error information. In this paper, we consider phenomenological decoding problems, where data qubit errors may occur between extractions, and each measurement can be faulty. We introduce generalized quantum data-syndrome codes along with a generalized check matrix that integrates both quaternary and binary alphabets to represent diverse error sources. This results in a Tanner graph with mixed variable nodes, enabling the design of belief propagation (BP) decoding algorithms that effectively handle phenomenological errors. Importantly, our BP decoders are applicable to general sparse quantum codes. Through simulations, we achieve an error threshold of more than 3% for quantum memory protected by rotated toric codes, using solely BP without post-processing. Our results indicate that d rounds of syndrome extraction are sufficient for a toric code of distance d. We observe that at high error rates, fewer rounds of syndrome extraction tend to perform better, while more rounds improve performance at lower error rates. Additionally, we propose a method to construct effective redundant stabilizer checks for single-shot error correction. Our simulations show that BP decoding remains highly effective even with a high syndrome error rate.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1824-1840"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465587","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Estimation of the Null Distribution in Large-Scale Inference","authors":"Subhodh Kotekal;Chao Gao","doi":"10.1109/TIT.2025.3529457","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529457","url":null,"abstract":"The advent of large-scale inference has spurred reexamination of conventional statistical thinking. In a series of highly original articles, Efron persuasively illustrated the danger for downstream inference in assuming the veracity of a posited null distribution. In a Gaussian model for n many z-scores with at most <inline-formula> <tex-math>$k lt frac {n}{2}$ </tex-math></inline-formula> nonnulls, Efron suggests estimating the parameters of an empirical null <inline-formula> <tex-math>$N(theta , sigma ^{2})$ </tex-math></inline-formula> instead of assuming the theoretical null <inline-formula> <tex-math>$N(0, 1)$ </tex-math></inline-formula>. Looking to the robust statistics literature by viewing the nonnulls as outliers is unsatisfactory as the question of optimal rates is still open; even consistency is not known in the regime <inline-formula> <tex-math>$k asymp n$ </tex-math></inline-formula> which is especially relevant to many large-scale inference applications. However, provably rate-optimal robust estimators have been developed in other models (e.g. Huber contamination) which appear quite close to Efron’s proposal. Notably, the impossibility of consistency when <inline-formula> <tex-math>$k asymp n$ </tex-math></inline-formula> in these other models may suggest the same major weakness afflicts Efron’s popularly adopted recommendation. A sound evaluation thus requires a complete understanding of information-theoretic limits. We characterize the regime of k for which consistent estimation is possible, notably without imposing any assumptions at all on the nonnull effects. Unlike in other robust models, it is shown consistent estimation of the location parameter is possible if and only if <inline-formula> <tex-math>$frac {n}{2} {-} k = omega (sqrt {n})$ </tex-math></inline-formula>, and of the scale parameter in the entire regime <inline-formula> <tex-math>$k lt frac {n}{2}$ </tex-math></inline-formula>. Furthermore, we establish sharp minimax rates and show estimators based on the empirical characteristic function are optimal by exploiting the Gaussian character of the data.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2075-2103"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Tight Exponential Strong Converse for Source Coding Problem With Encoded Side Information","authors":"Daisuke Takeuchi;Shun Watanabe","doi":"10.1109/TIT.2025.3529612","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529612","url":null,"abstract":"The source coding problem with encoded side information is considered. A lower bound on the strong converse exponent has been derived by Oohama, but its tightness has not been clarified. In this paper, we derive a tight strong converse exponent. For the special case where the side-information does not exist, we demonstrate that our tight exponent of the Wyner-Ahlswede-Körner (WAK) problem reduces to the known tight expression of that special case while Oohama’s lower bound is strictly loose. The converse part is proved by a judicious use of the change-of-measure argument, which was introduced by Gu and Effros and further developed by Tyagi and Watanabe. A key component of the methodology by Tyagi and Watanabe is the use of soft Markov constraint, which was originally introduced by Oohama, as a penalty term to prove the Markov constraint at the end. A technical innovation of this paper compared to Tyagi and Watanabe is recognizing that the soft Markov constraint is a part of the exponent, rather than a penalty term that should vanish at the end; this recognition enables us to derive the matching achievability bound. In fact, via numerical experiment, we provide evidence that the soft Markov constraint is strictly positive. Compared to Oohama’s derivation of the lower bound, which relies on the single-letterization of a certain moment-generating function, the derivation of our tight exponent only involves manipulations of the Kullback-Leibrer divergence and Shannon entropies. The achievability part is derived by a careful analysis of the type argument; however, unlike the conventional analysis for the achievable rate region, we need to derive the soft Markov constraint in the analysis of the correct probability. Furthermore, we present an application of our derivation of the strong converse exponent to the privacy amplification.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"1533-1545"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143465671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neophytos Charalambides;Hessam Mahdavifar;Alfred O. Hero
{"title":"Generalized Fractional Repetition Codes for Binary Coded Computations","authors":"Neophytos Charalambides;Hessam Mahdavifar;Alfred O. Hero","doi":"10.1109/TIT.2025.3529680","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529680","url":null,"abstract":"This paper addresses the gradient coding and coded matrix multiplication problems in distributed optimization and coded computing. We present a computationally efficient coding method which overcomes the drawbacks of the Fractional Repetition Coding gradient coding method proposed by Tandon et al., and can also be leveraged by coded computing networks whose servers are of heterogeneous nature. Specifically, we propose a construction for fractional repetition gradient coding; while ensuring that the generator matrix remains close to perfectly balanced for any set of coding parameters, as well as a low complexity decoding step. The proposed binary encoding avoids operations over the real and complex numbers which inherently introduce numerical and rounding errors, thereby enabling accurate distributed encodings of the partial gradients. We then make connections between gradient coding and coded matrix multiplication. Specifically, we show that any gradient coding scheme can be extended to coded matrix multiplication. Furthermore, we show how the proposed binary gradient coding scheme can be used to construct two different coded matrix multiplication schemes, each achieving different trade-offs.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2170-2194"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Grouping-Based Cyclic Scheduling Under Age of Correlated Information Constraints","authors":"Lehan Wang;Jingzhou Sun;Yuxuan Sun;Sheng Zhou;Zhisheng Niu;Miao Jiang;Lu Geng","doi":"10.1109/TIT.2025.3529497","DOIUrl":"https://doi.org/10.1109/TIT.2025.3529497","url":null,"abstract":"This paper studies an internet of things (IoT) network where a fusion center relies on multi-view and correlated information generated by multiple sources to monitor various regions. Each region possesses hard age of correlated information (AoCI) constraints for information update, and accordingly we propose a scheduling policy to satisfy such needs and minimize the required wireless resources. We first approximate the problem to a dual bin-packing problem. Secondly, efficient scheduling policies are identified when the age constraints possess special mathematical properties, where the number of channels at most required is analyzed. Optimality conditions of the proposed policies are presented. For general constraints, a two-step grouping algorithm for multi-view (TGAM) is proposed to establish scheduling policies. Under TGAM, the constraints are mapped into a combination of the special constraints. To quickly identify an optimized mapping from a vast solution space, TGAM heuristically groups the regions according to their constraints and then searches for the optimal mapping for each group. Numerical results demonstrate that, compared to a derived lower bound, the proposed TGAM requires only 1.07% more channels. Additionally, the number of regions that can be served by TGAM is significantly larger than the state-of-the art algorithm, given the number of channels.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2218-2244"},"PeriodicalIF":2.2,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455238","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Recep Can Yavas;Yuqi Huang;Vincent Y. F. Tan;Jonathan Scarlett
{"title":"A General Framework for Clustering and Distribution Matching With Bandit Feedback","authors":"Recep Can Yavas;Yuqi Huang;Vincent Y. F. Tan;Jonathan Scarlett","doi":"10.1109/TIT.2025.3528655","DOIUrl":"https://doi.org/10.1109/TIT.2025.3528655","url":null,"abstract":"We develop a general framework for clustering and distribution matching problems with bandit feedback. We consider a K-armed bandit model where some subset of K arms is partitioned into M groups. Within each group, the random variable associated to each arm follows the same distribution on a finite alphabet. At each time step, the decision maker pulls an arm and observes its outcome from the random variable associated to that arm. Subsequent arm pulls depend on the history of arm pulls and their outcomes. The decision maker has no knowledge of the distributions of the arms or the underlying partitions. The task is to devise an online algorithm to learn the underlying partition of arms with the least number of arm pulls on average and with an error probability not exceeding a pre-determined value <inline-formula> <tex-math>$delta $ </tex-math></inline-formula>. Several existing problems fall under our general framework, including finding M pairs of arms, odd arm identification, and N-ary clustering of K arms belong to our general framework. We derive a non-asymptotic lower bound on the average number of arm pulls for any online algorithm with an error probability not exceeding <inline-formula> <tex-math>$delta $ </tex-math></inline-formula>. Furthermore, we develop a computationally-efficient online algorithm based on the Track-and-Stop method and Frank-Wolfe algorithm, and show that the average number of arm pulls of our algorithm asymptotically matches that of the lower bound. Our refined analysis also uncovers a novel bound on the speed at which the average number of arm pulls of our algorithm converges to the fundamental limit as <inline-formula> <tex-math>$delta $ </tex-math></inline-formula> vanishes.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2116-2139"},"PeriodicalIF":2.2,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455308","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Divergence Maximizing Linear Projection for Supervised Dimension Reduction","authors":"Biao Chen;Joshua Kortje","doi":"10.1109/TIT.2025.3528340","DOIUrl":"https://doi.org/10.1109/TIT.2025.3528340","url":null,"abstract":"This paper proposes two linear projection methods for supervised dimension reduction using only first- and second-order statistics. The methods, each catering to a different parameter regime, are derived under the general Gaussian model by maximizing the Kullback-Leibler divergence between the two classes in the projected sample for a binary classification problem. They subsume existing linear projection approaches developed under simplifying assumptions of Gaussian distributions, such as these distributions might share an equal mean or covariance matrix. As a by-product, we establish that the multi-class linear discriminant analysis, a celebrated method for classification and supervised dimension reduction, is provably optimal for maximizing pairwise Kullback-Leibler divergence when the Gaussian populations share an identical covariance matrix. For the case when the Gaussian distributions share an equal mean, we establish conditions under which the optimal subspace remains invariant regardless of how the Kullback-Leibler divergence is defined, despite the asymmetry of the divergence measure itself. Such conditions encompass the classical case of signal plus noise, where both signal and noise have zero mean and arbitrary covariance matrices. Experiments are conducted to validate the proposed solutions, demonstrate their superior performance over existing alternatives, and illustrate the procedure for selecting the appropriate linear projection solution.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"71 3","pages":"2104-2115"},"PeriodicalIF":2.2,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143455192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}