{"title":"Orbit Structure of Grassmannian G2,m and a Decoder for Grassmann Code C(2, m)","authors":"Fernando L. Piñero;Prasant Singh","doi":"10.1109/TIT.2022.3213568","DOIUrl":"10.1109/TIT.2022.3213568","url":null,"abstract":"In this article, we consider decoding Grassmann codes, linear codes associated to the Grassmannian and its embedding in a projective space. We look at the orbit structure of Grassmannian arising from the multiplicative group \u0000<inline-formula> <tex-math>${mathbb {F}}_{q^{m}}^{*}$ </tex-math></inline-formula>\u0000 in \u0000<inline-formula> <tex-math>$GL_{m}(q)$ </tex-math></inline-formula>\u0000. We project the corresponding Grassmann code onto these orbits to obtain a subcode of a \u0000<inline-formula> <tex-math>$q$ </tex-math></inline-formula>\u0000–ary Reed-Solomon code. We prove that some of these projections contain an information set of the parent Grassmann code. By improving the decoding capacity of Peterson’s decoding algorithm for the projected subcodes, we prove that one can correct up to \u0000<inline-formula> <tex-math>$lfloor (d-1)/2rfloor $ </tex-math></inline-formula>\u0000 errors for Grassmann code, where \u0000<inline-formula> <tex-math>$d$ </tex-math></inline-formula>\u0000 is the minimum distance of Grassmann code.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"69 3","pages":"1509-1520"},"PeriodicalIF":2.5,"publicationDate":"2022-10-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9316090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse Group Lasso: Optimal Sample Complexity, Convergence Rate, and Statistical Inference","authors":"T. Tony Cai;Anru R. Zhang;Yuchen Zhou","doi":"10.1109/TIT.2022.3175455","DOIUrl":"10.1109/TIT.2022.3175455","url":null,"abstract":"We study sparse group Lasso for high-dimensional double sparse linear regression, where the parameter of interest is simultaneously element-wise and group-wise sparse. This problem is an important instance of the simultaneously structured model – an actively studied topic in statistics and machine learning. In the noiseless case, matching upper and lower bounds on sample complexity are established for the exact recovery of sparse vectors and for stable estimation of approximately sparse vectors, respectively. In the noisy case, upper and matching minimax lower bounds for estimation error are obtained. We also consider the debiased sparse group Lasso and investigate its asymptotic property for the purpose of statistical inference. Finally, numerical studies are provided to support the theoretical results.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"68 9","pages":"5975-6002"},"PeriodicalIF":2.5,"publicationDate":"2022-03-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10131968","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mechanisms for Hiding Sensitive Genotypes With Information-Theoretic Privacy","authors":"Fangwei Ye;Hyunghoon Cho;Salim El Rouayheb","doi":"10.1109/TIT.2022.3156276","DOIUrl":"10.1109/TIT.2022.3156276","url":null,"abstract":"Motivated by the growing availability of personal genomics services, we study an information-theoretic privacy problem that arises when sharing genomic data: a user wants to share his or her genome sequence while keeping the genotypes at certain positions hidden, which could otherwise reveal critical health-related information. A straightforward solution of erasing (masking) the chosen genotypes does not ensure privacy, because the correlation between nearby positions can leak the masked genotypes. We introduce an erasure-based privacy mechanism with perfect information-theoretic privacy, whereby the released sequence is statistically independent of the sensitive genotypes. Our mechanism can be interpreted as a locally-optimal greedy algorithm for a given processing order of sequence positions, where utility is measured by the number of positions released without erasure. We show that finding an optimal order is NP-hard in general and provide an upper bound on the optimal utility. For sequences from hidden Markov models, a standard modeling approach in genetics, we propose an efficient algorithmic implementation of our mechanism with complexity polynomial in sequence length. Moreover, we illustrate the robustness of the mechanism by bounding the privacy leakage from erroneous prior distributions. Our work is a step towards more rigorous control of privacy in genomic data sharing.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"68 6","pages":"4090-4105"},"PeriodicalIF":2.5,"publicationDate":"2022-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243750/pdf/nihms-1850165.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9991410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal High-Order Tensor SVD via Tensor-Train Orthogonal Iteration","authors":"Yuchen Zhou;Anru R. Zhang;Lili Zheng;Yazhen Wang","doi":"10.1109/TIT.2022.3152733","DOIUrl":"10.1109/TIT.2022.3152733","url":null,"abstract":"This paper studies a general framework for high-order tensor SVD. We propose a new computationally efficient algorithm, tensor-train orthogonal iteration (TTOI), that aims to estimate the low tensor-train rank structure from the noisy high-order tensor observation. The proposed TTOI consists of initialization via TT-SVD [Oseledets (2011)] and new iterative backward/forward updates. We develop the general upper bound on estimation error for TTOI with the support of several new representation lemmas on tensor matricizations. By developing a matching information-theoretic lower bound, we also prove that TTOI achieves the minimax optimality under the spiked tensor model. The merits of the proposed TTOI are illustrated through applications to estimation and dimension reduction of high-order Markov processes, numerical studies, and a real data example on New York City taxi travel records. The software of the proposed algorithm is available online (\u0000<uri>https://github.com/Lili-Zheng-stat/TTOI</uri>\u0000).","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"68 6","pages":"3991-4019"},"PeriodicalIF":2.5,"publicationDate":"2022-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9585995/pdf/nihms-1809459.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9610977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vinnu Bhardwaj;Pavel A. Pevzner;Cyrus Rashtchian;Yana Safonova
{"title":"Trace Reconstruction Problems in Computational Biology","authors":"Vinnu Bhardwaj;Pavel A. Pevzner;Cyrus Rashtchian;Yana Safonova","doi":"10.1109/TIT.2020.3030569","DOIUrl":"10.1109/TIT.2020.3030569","url":null,"abstract":"The problem of reconstructing a string from its error-prone copies, the trace reconstruction problem, was introduced by Vladimir Levenshtein two decades ago. While there has been considerable theoretical work on trace reconstruction, practical solutions have only recently started to emerge in the context of two rapidly developing research areas: immunogenomics and DNA data storage. In immunogenomics, traces correspond to mutated copies of genes, with mutations generated naturally by the adaptive immune system. In DNA data storage, traces correspond to noisy copies of DNA molecules that encode digital data, with errors being artifacts of the data retrieval process. In this paper, we introduce several new trace generation models and open questions relevant to trace reconstruction for immunogenomics and DNA data storage, survey theoretical results on trace reconstruction, and highlight their connections to computational biology. Throughout, we discuss the applicability and shortcomings of known solutions and suggest future research directions.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"67 6","pages":"3295-3314"},"PeriodicalIF":2.5,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIT.2020.3030569","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39043637","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Parallel Device-Independent Quantum Key Distribution","authors":"Rahul Jain;Carl A. Miller;Yaoyun Shi","doi":"10.1109/TIT.2020.2986740","DOIUrl":"10.1109/TIT.2020.2986740","url":null,"abstract":"A prominent application of quantum cryptography is the distribution of cryptographic keys that are provably secure. Such security proofs were extended by Vazirani and Vidick (Physical Review Letters, 113, 140501, 2014) to the deviceindependent (DI) scenario, where the users do not need to trust the integrity of the underlying quantum devices. The protocols analyzed by them and by subsequent authors all require a sequential execution of N multiplayer games, where N is the security parameter. In this work, we prove the security of a protocol where all games are executed in parallel. Besides decreasing the number of time-steps necessary for key generation, this result reduces the security requirements for DI-QKD by allowing arbitrary information leakage of each user's inputs within his or her lab. To the best of our knowledge, this is the first parallel security proof for a fully device-independent QKD protocol. Our protocol tolerates a constant level of device imprecision and achieves a linear key rate.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"66 9","pages":"5567-5584"},"PeriodicalIF":2.5,"publicationDate":"2020-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIT.2020.2986740","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25424542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse Recovery Beyond Compressed Sensing: Separable Nonlinear Inverse Problems","authors":"Brett Bernstein;Sheng Liu;Chrysa Papadaniil;Carlos Fernandez-Granda","doi":"10.1109/TIT.2020.2985015","DOIUrl":"10.1109/TIT.2020.2985015","url":null,"abstract":"Extracting information from nonlinear measurements is a fundamental challenge in data analysis. In this work, we consider separable inverse problems, where the data are modeled as a linear combination of functions that depend nonlinearly on certain parameters of interest. These parameters may represent neuronal activity in a human brain, frequencies of electromagnetic waves, fluorescent probes in a cell, or magnetic relaxation times of biological tissues. Separable nonlinear inverse problems can be reformulated as underdetermined sparse-recovery problems, and solved using convex programming. This approach has had empirical success in a variety of domains, from geophysics to medical imaging, but lacks a theoretical justification. In particular, compressed-sensing theory does not apply, because the measurement operators are deterministic and violate incoherence conditions such as the restricted-isometry property. Our main contribution is a theory for sparse recovery adapted to deterministic settings. We show that convex programming succeeds in recovering the parameters of interest, as long as their values are sufficiently distinct with respect to the correlation structure of the measurement operator. The theoretical results are illustrated through numerical experiments for two applications: heat-source localization and estimation of brain activity from electroencephalography data.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"66 9","pages":"5904-5926"},"PeriodicalIF":2.5,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIT.2020.2985015","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"38373977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Sparse and Low-Rank Tensor Estimation via Cubic Sketchings","authors":"Botao Hao;Anru Zhang;Guang Cheng","doi":"10.1109/TIT.2020.2982499","DOIUrl":"10.1109/TIT.2020.2982499","url":null,"abstract":"In this paper, we propose a general framework for sparse and low-rank tensor estimation from cubic sketchings. A two-stage non-convex implementation is developed based on sparse tensor decomposition and thresholded gradient descent, which ensures exact recovery in the noiseless case and stable recovery in the noisy case with high probability. The non-asymptotic analysis sheds light on an interplay between optimization error and statistical error. The proposed procedure is shown to be rate-optimal under certain conditions. As a technical by-product, novel high-order concentration inequalities are derived for studying high-moment sub-Gaussian tensors. An interesting tensor formulation illustrates the potential application to high-order interaction pursuit in high-dimensional linear regression.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"66 9","pages":"5927-5964"},"PeriodicalIF":2.5,"publicationDate":"2020-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIT.2020.2982499","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25511785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Levenshtein Distance, Sequence Comparison and Biological Database Search","authors":"Bonnie Berger;Michael S. Waterman;Yun William Yu","doi":"10.1109/TIT.2020.2996543","DOIUrl":"10.1109/TIT.2020.2996543","url":null,"abstract":"Levenshtein edit distance has played a central role-both past and present-in sequence alignment in particular and biological database similarity search in general. We start our review with a history of dynamic programming algorithms for computing Levenshtein distance and sequence alignments. Following, we describe how those algorithms led to heuristics employed in the most widely used software in bioinformatics, BLAST, a program to search DNA and protein databases for evolutionarily relevant similarities. More recently, the advent of modern genomic sequencing and the volume of data it generates has resulted in a return to the problem of local alignment. We conclude with how the mathematical formulation of Levenshtein distance as a metric made possible additional optimizations to similarity search in biological contexts. These modern optimizations are built around the low metric entropy and fractional dimensionality of biological databases, enabling orders of magnitude acceleration of biological similarity search.","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"67 6","pages":"3287-3294"},"PeriodicalIF":2.5,"publicationDate":"2020-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TIT.2020.2996543","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39180972","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Eugenia-Maria Kontopoulou, Gregory-Paul Dexter, Wojciech Szpankowski, Ananth Grama, Petros Drineas
{"title":"Randomized Linear Algebra Approaches to Estimate the Von Neumann Entropy of Density Matrices.","authors":"Eugenia-Maria Kontopoulou, Gregory-Paul Dexter, Wojciech Szpankowski, Ananth Grama, Petros Drineas","doi":"10.1109/tit.2020.2971991","DOIUrl":"https://doi.org/10.1109/tit.2020.2971991","url":null,"abstract":"<p><p>The <i>von Neumann entropy</i>, named after John von Neumann, is an extension of the classical concept of entropy to the field of quantum mechanics. From a numerical perspective, von Neumann entropy can be computed simply by computing all eigenvalues of a density matrix, an operation that could be prohibitively expensive for large-scale density matrices. We present and analyze three randomized algorithms to approximate von Neumann entropy of real density matrices: our algorithms leverage recent developments in the Randomized Numerical Linear Algebra (RandNLA) literature, such as randomized trace estimators, provable bounds for the power method, and the use of random projections to approximate the eigenvalues of a matrix. All three algorithms come with provable accuracy guarantees and our experimental evaluations support our theoretical findings showing considerable speedup with small loss in accuracy.</p>","PeriodicalId":13494,"journal":{"name":"IEEE Transactions on Information Theory","volume":"66 8","pages":"5003-5021"},"PeriodicalIF":2.5,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/tit.2020.2971991","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"25511784","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}