{"title":"Graph Coded Merkle Tree: Mitigating Data Availability Attacks in Blockchain Systems Using Informed Design of Polar Factor Graphs","authors":"Debarnab Mitra;Lev Tauz;Lara Dolecek","doi":"10.1109/JSAIT.2023.3315148","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3315148","url":null,"abstract":"Data availability (DA) attack is a well-known problem in certain blockchains where users accept an invalid block with unavailable portions. Previous works have used LDPC and 2-D Reed Solomon (2D-RS) codes with Merkle trees to mitigate DA attacks. These codes perform well across various metrics such as DA detection probability and communication cost. However, these codes are difficult to apply to blockchains with large blocks due to large decoding complexity and coding fraud proof size (2D-RS codes), and intractable code guarantees for large code lengths (LDPC codes). In this paper, we focus on large block size applications and address the above challenges by proposing the novel Graph Coded Merkle Tree (GCMT): a Merkle tree encoded using polar encoding graphs. We provide a specialized polar encoding graph design algorithm called Sampling Efficient Freezing and an algorithm to prune the polar encoding graph. We demonstrate that the GCMT built using the above techniques results in a better DA detection probability and communication cost compared to LDPC codes, has a lower coding fraud proof size compared to LDPC and 2D-RS codes, provides tractable code guarantees at large code lengths (similar to 2D-RS codes), and has comparable decoding complexity to 2D-RS and LDPC codes.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"434-452"},"PeriodicalIF":0.0,"publicationDate":"2023-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50426630","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Minimum Weight Codewords of PAC Codes: The Impact of Pre-Transformation","authors":"Mohammad Rowshan;Jinhong Yuan","doi":"10.1109/JSAIT.2023.3312678","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3312678","url":null,"abstract":"The minimum Hamming distance of a linear block code is the smallest number of bit changes required to transform one valid codeword into another. The code’s minimum distance determines the code’s error-correcting capabilities. Furthermore, The number of minimum weight codewords, a.k.a. error coefficient, gives a good comparative measure for the block error rate (BLER) of linear block codes with identical minimum distance, in particular at a high SNR regime under maximum likelihood (ML) decoding. A code with a smaller error coefficient would give a lower BLER. Unlike polar codes, a closed-form expression for the enumeration of the error coefficient of polarization-adjusted convolutional (PAC) codes is yet unknown. As PAC codes are convolutionally pre-transformed polar codes, we study the impact of pre-transformation on polar codes in terms of minimum Hamming distance and error coefficient by partitioning the codewords into cosets. We show that the minimum distance of PAC codes does not decrease; however, the pre-transformation may reduce the error coefficient depending on the choice of convolutional polynomial. We recognize the properties of the cosets where pre-transformation is ineffective in decreasing the error coefficient, giving a lower bound for the error coefficient. Then, we propose a low-complexity enumeration method that determines the number of minimum weight codewords of PAC codes relying on the error coefficient of polar codes. That is, given the error coefficient \u0000<inline-formula> <tex-math>${mathcal {A}}_{w_{min}}$ </tex-math></inline-formula>\u0000 of polar codes, we determine the reduction \u0000<inline-formula> <tex-math>$X$ </tex-math></inline-formula>\u0000 in the error coefficient due to convolutional pre-transformation in PAC coding and subtract it from the error coefficient of polar codes, \u0000<inline-formula> <tex-math>${mathcal {A}}_{w_{min}}-X$ </tex-math></inline-formula>\u0000. Furthermore, we numerically analyze the tightness of the lower bound and the impact of the choice of the convolutional polynomial on the error coefficient based on the sub-patterns in the polynomial’s coefficients. Eventually, we show how we can further reduce the error coefficient in the cosets.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"487-498"},"PeriodicalIF":0.0,"publicationDate":"2023-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50426700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neophytos Charalambides;Mert Pilanci;Alfred O. Hero
{"title":"Securely Aggregated Coded Matrix Inversion","authors":"Neophytos Charalambides;Mert Pilanci;Alfred O. Hero","doi":"10.1109/JSAIT.2023.3312233","DOIUrl":"10.1109/JSAIT.2023.3312233","url":null,"abstract":"Coded computing is a method for mitigating straggling workers in a centralized computing network, by using erasure-coding techniques. Federated learning is a decentralized model for training data distributed across client devices. In this work we propose approximating the inverse of an aggregated data matrix, where the data is generated by clients; similar to the federated learning paradigm, while also being resilient to stragglers. To do so, we propose a coded computing method based on gradient coding. We modify this method so that the coordinator does not access the local data at any point; while the clients access the aggregated matrix in order to complete their tasks. The network we consider is not centrally administrated, and the communications which take place are secure against potential eavesdroppers.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"405-419"},"PeriodicalIF":0.0,"publicationDate":"2023-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46029173","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Randomized Polar Codes for Anytime Distributed Machine Learning","authors":"Burak Bartan;Mert Pilanci","doi":"10.1109/JSAIT.2023.3310931","DOIUrl":"10.1109/JSAIT.2023.3310931","url":null,"abstract":"We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations. The proposed mechanism integrates the concepts of randomized sketching and polar codes in the context of coded computation. We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery. Additionally, we provide an anytime estimator that can generate provably accurate estimates even when the set of available node outputs is not decodable. We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization. We present the implementation of these methods on a serverless cloud computing system and provide numerical results to demonstrate their scalability in practice, including ImageNet scale computations.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"393-404"},"PeriodicalIF":0.0,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43691019","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anindya Bijoy Das;Aditya Ramamoorthy;David J. Love;Christopher G. Brinton
{"title":"Distributed Matrix Computations With Low-Weight Encodings","authors":"Anindya Bijoy Das;Aditya Ramamoorthy;David J. Love;Christopher G. Brinton","doi":"10.1109/JSAIT.2023.3308768","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3308768","url":null,"abstract":"Straggler nodes are well-known bottlenecks of distributed matrix computations which induce reductions in computation/communication speeds. A common strategy for mitigating such stragglers is to incorporate Reed-Solomon based MDS (maximum distance separable) codes into the framework; this can achieve resilience against an optimal number of stragglers. However, these codes assign dense linear combinations of submatrices to the worker nodes. When the input matrices are sparse, these approaches increase the number of non-zero entries in the encoded matrices, which in turn adversely affects the worker computation time. In this work, we develop a distributed matrix computation approach where the assigned encoded submatrices are random linear combinations of a small number of submatrices. In addition to being well suited for sparse input matrices, our approach continues to have the optimal straggler resilience in a certain range of problem parameters. Moreover, compared to recent sparse matrix computation approaches, the search for a “good” set of random coefficients to promote numerical stability in our method is much more computationally efficient. We show that our approach can efficiently utilize partial computations done by slower worker nodes in a heterogeneous system which can enhance the overall computation speed. Numerical experiments conducted through Amazon Web Services (AWS) demonstrate up to 30% reduction in per worker node computation time and \u0000<inline-formula> <tex-math>$100times $ </tex-math></inline-formula>\u0000 faster encoding compared to the available methods.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"363-378"},"PeriodicalIF":0.0,"publicationDate":"2023-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50427091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Fereydounian;Hamed Hassani;Mohammad Vahid Jamali;Hessam Mahdavifar
{"title":"Channel Coding at Low Capacity","authors":"Mohammad Fereydounian;Hamed Hassani;Mohammad Vahid Jamali;Hessam Mahdavifar","doi":"10.1109/JSAIT.2023.3305874","DOIUrl":"10.1109/JSAIT.2023.3305874","url":null,"abstract":"Low-capacity scenarios have become increasingly important in the technology of the Internet of Things (IoT) and the next generation of wireless networks. Such scenarios require efficient and reliable transmission over channels with an extremely small capacity. Within these constraints, the state-of-the-art coding techniques may not be directly applicable. Moreover, the prior work on the finite-length analysis of optimal channel coding provides inaccurate predictions of the limits in the low-capacity regime. In this paper, we study channel coding at low capacity from two perspectives: fundamental limits at finite length and code constructions. We first specify what a low-capacity regime means. We then characterize finite-length fundamental limits of channel coding in the low-capacity regime for various types of channels, including binary erasure channels (BECs), binary symmetric channels (BSCs), and additive white Gaussian noise (AWGN) channels. From the code construction perspective, we characterize the optimal number of repetitions for transmission over binary memoryless symmetric (BMS) channels, in terms of the code blocklength and the underlying channel capacity, such that the capacity loss due to the repetition is negligible. Furthermore, it is shown that capacity-achieving polar codes naturally adopt the aforementioned optimal number of repetitions.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"351-362"},"PeriodicalIF":0.0,"publicationDate":"2023-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42302948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhenyu Liu;Andrea Conti;Sanjoy K. Mitter;Moe Z. Win
{"title":"Continuous-Time Distributed Filtering With Sensing and Communication Constraints","authors":"Zhenyu Liu;Andrea Conti;Sanjoy K. Mitter;Moe Z. Win","doi":"10.1109/JSAIT.2023.3304249","DOIUrl":"10.1109/JSAIT.2023.3304249","url":null,"abstract":"Distributed filtering is crucial in many applications such as localization, radar, autonomy, and environmental monitoring. The aim of distributed filtering is to infer time-varying unknown states using data obtained via sensing and communication in a network. This paper analyzes continuous-time distributed filtering with sensing and communication constraints. In particular, the paper considers a building-block system of two nodes, where each node is tasked with inferring a time-varying unknown state. At each time, the two nodes obtain noisy observations of the unknown states via sensing and perform communication via a Gaussian feedback channel. The distributed filter of the unknown state is computed based on both the sensor observations and the received messages. We analyze the asymptotic performance of the distributed filter by deriving a necessary and sufficient condition of the sensing and communication capabilities under which the mean-square error of the distributed filter is bounded over time. Numerical results are presented to validate the derived necessary and sufficient condition.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"667-681"},"PeriodicalIF":0.0,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"62353985","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the Implementation of Boolean Functions on Content-Addressable Memories","authors":"Ron M. Roth","doi":"10.1109/JSAIT.2023.3279333","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3279333","url":null,"abstract":"Let \u0000<inline-formula> <tex-math>$[qrangle $ </tex-math></inline-formula>\u0000 denote the integer set \u0000<inline-formula> <tex-math>${0,1, {ldots },q-1}$ </tex-math></inline-formula>\u0000 and let \u0000<inline-formula> <tex-math>${{mathbb {B}}}={0,1}$ </tex-math></inline-formula>\u0000. The problem of implementing functions \u0000<inline-formula> <tex-math>$[qrangle rightarrow {{mathbb {B}}}$ </tex-math></inline-formula>\u0000 on content-addressable memories (CAMs) is considered. CAMs can be classified by the input alphabet and the state alphabet of their cells; for example, in binary CAMs, those alphabets are both \u0000<inline-formula> <tex-math>${{mathbb {B}}}$ </tex-math></inline-formula>\u0000, while in a ternary CAM (TCAM), both alphabets are endowed with a “don’t care” symbol. This work is motivated by recent proposals for using CAMs for fast inference on decision trees. In such learning models, the tree nodes carry out integer comparisons, such as testing equality \u0000<inline-formula> <tex-math>$(x=t$ </tex-math></inline-formula>\u0000 ?) or inequality \u0000<inline-formula> <tex-math>$(xle t$ </tex-math></inline-formula>\u0000 ?), where \u0000<inline-formula> <tex-math>$xin [qrangle $ </tex-math></inline-formula>\u0000 is an input to the node and \u0000<inline-formula> <tex-math>$tin [qrangle $ </tex-math></inline-formula>\u0000 is a node parameter. A CAM implementation of such comparisons includes mapping (i.e., encoding) \u0000<inline-formula> <tex-math>$t$ </tex-math></inline-formula>\u0000 into internal states of some number \u0000<inline-formula> <tex-math>$n$ </tex-math></inline-formula>\u0000 of cells and mapping \u0000<inline-formula> <tex-math>$x$ </tex-math></inline-formula>\u0000 into inputs to these cells, with the goal of minimizing \u0000<inline-formula> <tex-math>$n$ </tex-math></inline-formula>\u0000. Such mappings are presented for various comparison families, as well as for the set of all functions \u0000<inline-formula> <tex-math>$[qrangle rightarrow {{mathbb {B}}}$ </tex-math></inline-formula>\u0000, under several scenarios of input and state alphabets of the CAM cells. All those mappings are shown to be optimal in that they attain the smallest possible \u0000<inline-formula> <tex-math>$n$ </tex-math></inline-formula>\u0000 for any given \u0000<inline-formula> <tex-math>$q$ </tex-math></inline-formula>\u0000.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"379-392"},"PeriodicalIF":0.0,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50354870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Genomic Compression With Read Alignment at the Decoder","authors":"Yotam Gershon;Yuval Cassuto","doi":"10.1109/JSAIT.2023.3300831","DOIUrl":"10.1109/JSAIT.2023.3300831","url":null,"abstract":"We propose a new compression scheme for genomic data given as sequence fragments called reads. The scheme uses a reference genome at the decoder side only, freeing the encoder from the burdens of storing references and performing computationally costly alignment operations. The main ingredient of the scheme is a multi-layer code construction, delivering to the decoder sufficient information to align the reads, correct their differences from the reference, validate their reconstruction, and correct reconstruction errors. The core of the method is the well-known concept of distributed source coding with decoder side information, fortified by a generalized-concatenation code construction enabling efficient embedding of all the information needed for reliable reconstruction. We first present the scheme for the case of substitution errors only between the reads and the reference, and then extend it to support reads with a single deletion and multiple substitutions. A central tool in this extension is a new distance metric that is shown analytically to improve alignment performance over existing distance metrics.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"314-330"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42367267","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mohammad Vahid Jamali;Xiyang Liu;Ashok Vardhan Makkuva;Hessam Mahdavifar;Sewoong Oh;Pramod Viswanath
{"title":"Machine Learning-Aided Efficient Decoding of Reed–Muller Subcodes","authors":"Mohammad Vahid Jamali;Xiyang Liu;Ashok Vardhan Makkuva;Hessam Mahdavifar;Sewoong Oh;Pramod Viswanath","doi":"10.1109/JSAIT.2023.3298362","DOIUrl":"10.1109/JSAIT.2023.3298362","url":null,"abstract":"Reed-Muller (RM) codes achieve the capacity of general binary-input memoryless symmetric channels and are conjectured to have a comparable performance to that of random codes in terms of scaling laws. However, such results are established assuming maximum-likelihood decoders for general code parameters. Also, RM codes only admit limited sets of rates. Efficient decoders such as successive cancellation list (SCL) decoder and recently-introduced recursive projection-aggregation (RPA) decoders are available for RM codes at finite lengths. In this paper, we focus on subcodes of RM codes with flexible rates. We first extend the RPA decoding algorithm to RM subcodes. To lower the complexity of our decoding algorithm, referred to as subRPA, we investigate different approaches to prune the projections. Next, we derive the soft-decision based version of our algorithm, called soft-subRPA, that not only improves upon the performance of subRPA but also enables a differentiable decoding algorithm. Building upon the soft-subRPA algorithm, we then provide a framework for training a machine learning (ML) model to search for \u0000<italic>good</i>\u0000 sets of projections that minimize the decoding error rate. Training our ML model enables achieving very close to the performance of full-projection decoding with a significantly smaller number of projections. We also show that the choice of the projections in decoding RM subcodes matters significantly, and our ML-aided projection pruning scheme is able to find a \u0000<italic>good</i>\u0000 selection, i.e., with negligible performance degradation compared to the full-projection case, given a reasonable number of projections.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"260-275"},"PeriodicalIF":0.0,"publicationDate":"2023-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47958050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}