{"title":"Learning Invariant Representations Under General Interventions on the Response","authors":"Kang Du;Yu Xiang","doi":"10.1109/JSAIT.2023.3328651","DOIUrl":"10.1109/JSAIT.2023.3328651","url":null,"abstract":"It has become increasingly common nowadays to collect observations of feature and response pairs from different environments. As a consequence, one has to apply learned predictors to data with a different distribution due to distribution shifts. One principled approach is to adopt the structural causal models to describe training and test models, following the invariance principle which says that the conditional distribution of the response given its predictors remains the same across environments. However, this principle might be violated in practical settings when the response is intervened. A natural question is whether it is still possible to identify other forms of invariance to facilitate prediction in unseen environments. To shed light on this challenging scenario, we focus on linear structural causal models (SCMs) and introduce invariant matching property (IMP), an explicit relation to capture interventions through an additional feature, leading to an alternative form of invariance that enables a unified treatment of general interventions on the response as well as the predictors. We analyze the asymptotic generalization errors of our method under both the discrete and continuous environment settings, where the continuous case is handled by relating it to the semiparametric varying coefficient models. We present algorithms that show competitive performance compared to existing methods over various experimental settings including a COVID dataset.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"808-819"},"PeriodicalIF":0.0,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134887604","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Timely Multi-Process Estimation Over Erasure Channels With and Without Feedback: Signal-Independent Policies","authors":"Karim Banawan;Ahmed Arafa;Karim G. Seddik","doi":"10.1109/JSAIT.2023.3329431","DOIUrl":"10.1109/JSAIT.2023.3329431","url":null,"abstract":"We consider a multi-process remote estimation system observing \u0000<inline-formula> <tex-math>$K$ </tex-math></inline-formula>\u0000 independent Ornstein-Uhlenbeck processes. In this system, a shared sensor samples the \u0000<inline-formula> <tex-math>$K$ </tex-math></inline-formula>\u0000 processes in such a way that the long-term average sum mean square error (MSE) is minimized using signal-independent sampling policies, in which sampling instances are chosen independently from the processes’ values. The sensor operates under a total sampling frequency constraint \u0000<inline-formula> <tex-math>$f_{max }$ </tex-math></inline-formula>\u0000. The samples from all processes consume random processing delays in a shared queue and then are transmitted over an erasure channel with probability \u0000<inline-formula> <tex-math>$epsilon $ </tex-math></inline-formula>\u0000. We study two variants of the problem: first, when the samples are scheduled according to a Maximum-Age-First (MAF) policy, and the receiver provides an erasure status feedback; and second, when samples are scheduled according to a Round-Robin (RR) policy, when there is no erasure status feedback from the receiver. Aided by optimal structural results, we show that the optimal sampling policy for both settings, under some conditions, is a threshold policy. We characterize the optimal threshold and the corresponding optimal long-term average sum MSE as a function of \u0000<inline-formula> <tex-math>$K$ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$f_{max }$ </tex-math></inline-formula>\u0000, \u0000<inline-formula> <tex-math>$epsilon $ </tex-math></inline-formula>\u0000, and the statistical properties of the observed processes. Our results show that, with an exponentially distributed service rate, the optimal threshold \u0000<inline-formula> <tex-math>$tau ^{ast}$ </tex-math></inline-formula>\u0000 increases as the number of processes \u0000<inline-formula> <tex-math>$K$ </tex-math></inline-formula>\u0000 increases, for both settings. Additionally, we show that the optimal threshold is an increasing function of \u0000<inline-formula> <tex-math>$epsilon $ </tex-math></inline-formula>\u0000 in the case of available erasure status feedback, while it exhibits the opposite behavior, i.e., \u0000<inline-formula> <tex-math>$tau ^{ast}$ </tex-math></inline-formula>\u0000 is a decreasing function of \u0000<inline-formula> <tex-math>$epsilon $ </tex-math></inline-formula>\u0000, in the case of absent erasure status feedback.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"607-623"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135362790","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Partial Homoscedasticity in Causal Discovery With Linear Models","authors":"Jun Wu;Mathias Drton","doi":"10.1109/JSAIT.2023.3328476","DOIUrl":"10.1109/JSAIT.2023.3328476","url":null,"abstract":"Recursive linear structural equation models and the associated directed acyclic graphs (DAGs) play an important role in causal discovery. The classic identifiability result for this class of models states that when only observational data is available, each DAG can be identified only up to a Markov equivalence class. In contrast, recent work has shown that the DAG can be uniquely identified if the errors in the model are homoscedastic, i.e., all have the same variance. This equal variance assumption yields methods that, if appropriate, are highly scalable and also sheds light on fundamental information-theoretic limits and optimality in causal discovery. In this paper, we fill the gap that exists between the two previously considered cases, which assume the error variances to be either arbitrary or all equal. Specifically, we formulate a framework of partial homoscedasticity, in which the variables are partitioned into blocks and each block shares the same error variance. For any such groupwise equal variances assumption, we characterize when two DAGs give rise to identical Gaussian linear structural equation models. Furthermore, we show how the resulting distributional equivalence classes may be represented using a completed partially directed acyclic graph (CPDAG), and we give an algorithm to efficiently construct this CPDAG. In a simulation study, we demonstrate that greedy search provides an effective way to learn the CPDAG and exploit partial knowledge about homoscedasticity of errors in structural equation models.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"639-650"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10304270","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135360763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Energy Minimization Under a Peak Age of Information Constraint","authors":"Kumar Saurav;Rahul Vaze","doi":"10.1109/JSAIT.2023.3329034","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3329034","url":null,"abstract":"We consider a node where packets of fixed size (inbits) are generated at arbitrary intervals. The node is required to maintain the peak age of information (AoI) at the monitor below a threshold by transmitting potentially a subset of the generated packets. At any time, depending on the packet availability and the current AoI, the node can choose which packet to transmit, and at what transmission speed (in bits per second). Power consumption is a monotonically increasing convex function of the transmission speed. In this paper, for any given time horizon, the objective is to find a causal policy that minimizes the total energy consumption while satisfying the peak AoI constraint. We consider competitive ratio as the performance metric, that is defined as the ratio of the expected cost of a causal policy, and the expected cost of an optimal offline policy that knows the input (packet generation times) in advance. We first derive a lower bound on the competitive ratio of all causal policies, in terms of the system parameters (such as power function, packet size and peak AoI threshold), and then propose a particular policy for which we show that its competitive ratio has similar order of dependence on the system parameters as the derived lower bound.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"579-590"},"PeriodicalIF":0.0,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138431090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Markus Fidler;Jaya Prakash Champati;Joerg Widmer;Mahsa Noroozi
{"title":"Statistical Age-of-Information Bounds for Parallel Systems: When Do Independent Channels Make a Difference?","authors":"Markus Fidler;Jaya Prakash Champati;Joerg Widmer;Mahsa Noroozi","doi":"10.1109/JSAIT.2023.3328766","DOIUrl":"10.1109/JSAIT.2023.3328766","url":null,"abstract":"This paper contributes tail bounds of the age-of-information of a general class of parallel systems and explores their potential. Parallel systems arise in relevant cases, such as in multi-band mobile networks, multi-technology wireless access, or multi-path protocols, just to name a few. Typically, control over each communication channel is limited and random service outages and congestion cause buffering that impairs the age-of-information. The parallel use of independent channels promises a remedy, since outages on one channel may be compensated for by another. Surprisingly, for the well-known case of \u0000<inline-formula> <tex-math>$text{M}mid text{M}mid 1$ </tex-math></inline-formula>\u0000 queues we find the opposite: pooling capacity in one channel performs better than a parallel system with the same total capacity. A generalization is not possible since there are no solutions for other types of parallel queues at hand. In this work, we prove a dual representation of age-of-information in min-plus algebra that connects to queueing models known from the theory of effective bandwidth/capacity and the stochastic network calculus. Exploiting these methods, we derive tail bounds of the age-of-information of \u0000<inline-formula> <tex-math>$text{G}mid text{G}mid 1$ </tex-math></inline-formula>\u0000 queues. Tail bounds of the age-of-information of independent parallel queues follow readily. In addition to parallel classical queues, we investigate Markov channels where, depending on the memory of the channel, we show the true advantage of parallel systems. We continue to investigate this new finding and provide insight into when capacity should be pooled in one channel or when independent parallel channels perform better. We complement our analysis with simulation results and evaluate different update policies, scheduling policies, and the use of heterogeneous channels that is most relevant for latest multi-band networks.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"591-606"},"PeriodicalIF":0.0,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10302220","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135262828","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Linear Gaussian Polytree Models With Interventions","authors":"Daniele Tramontano;L. Waldmann;M. Drton;Eliana Duarte","doi":"10.1109/JSAIT.2023.3328429","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3328429","url":null,"abstract":"We present a consistent and highly scalable local approach to learn the causal structure of a linear Gaussian polytree using data from interventional experiments with known intervention targets. Our methods first learn the skeleton of the polytree and then orient its edges. The output is a CPDAG representing the interventional equivalence class of the polytree of the true underlying distribution. The skeleton and orientation recovery procedures we use rely on second order statistics and low-dimensional marginal distributions. We assess the performance of our methods under different scenarios in synthetic data sets and apply our algorithm to learn a polytree in a gene expression interventional data set. Our simulation studies demonstrate that our approach is fast, has good accuracy in terms of structural Hamming distance, and handles problems with thousands of nodes.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"569-578"},"PeriodicalIF":0.0,"publicationDate":"2023-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134795150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scheduling to Minimize Age of Information With Multiple Sources","authors":"Kumar Saurav;Rahul Vaze","doi":"10.1109/JSAIT.2023.3322077","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3322077","url":null,"abstract":"Finding an optimal/near-optimal scheduling algorithm to minimize the age of information (AoI) in a multi-source G/G/1 system is well-known to be a hard problem, more so if there is a transmission (energy) cost. In this paper, we consider a multi-source G/G/1 system and the goal is to minimize a weighted sum of the AoI of all sources, subject to an energy cost constraint. We propose a novel doubly randomized non-preemptive scheduling algorithm and show that in the non-preemptive setting, where an update under transmission cannot be preempted, the competitive ratio of the proposed algorithm is at most 3 plus the maximum of the ratio of the variance and the mean of the update inter-generation time distribution of sources. Notably, the competitive ratio is independent of the number of sources, or their service time distributions, and is at most 4 for several common update inter-generation time distributions such as exponential, uniform and Rayleigh. For preemptive setting, where an update under transmission can be preempted, we consider a multi-source G/M/1 system and show that the proposed non-preemptive algorithm has competitive ratio at most 5 plus the maximum of the ratio of the variance and the mean of the update inter-generation time distribution of sources.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"539-550"},"PeriodicalIF":0.0,"publicationDate":"2023-10-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"109157409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Md Kamran Chowdhury Shisher;Bo Ji;I-Hong Hou;Yin Sun
{"title":"Learning and Communications Co-Design for Remote Inference Systems: Feature Length Selection and Transmission Scheduling","authors":"Md Kamran Chowdhury Shisher;Bo Ji;I-Hong Hou;Yin Sun","doi":"10.1109/JSAIT.2023.3322620","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3322620","url":null,"abstract":"In this paper, we consider a remote inference system, where a neural network is used to infer a time-varying target (e.g., robot movement), based on features (e.g., video clips) that are progressively received from a sensing node (e.g., a camera). Each feature is a temporal sequence of sensory data. The inference error is determined by (i) the timeliness and (ii) the sequence length of the feature, where we use Age of Information (AoI) as a metric for timeliness. While a longer feature can typically provide better inference performance, it often requires more channel resources for sending the feature. To minimize the time-averaged inference error, we study a learning and communication co-design problem that jointly optimizes feature length selection and transmission scheduling. When there is a single sensor-predictor pair and a single channel, we develop low-complexity optimal co-designs for both the cases of time-invariant and time-variant feature length. When there are multiple sensor-predictor pairs and multiple channels, the co-design problem becomes a restless multi-arm multi-action bandit problem that is PSPACE-hard. For this setting, we design a low-complexity algorithm to solve the problem. Trace-driven evaluations demonstrate the potential of these co-designs to reduce inference error by up to 10000 times.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"524-538"},"PeriodicalIF":0.0,"publicationDate":"2023-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50354744","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shailja Agrawal;K. V. Sushena Sree;Prasad Krishnan;Abhinav Vaishya;Srikar Kale
{"title":"Cache-Aided Communication Schemes via Combinatorial Designs and Their q-Analogs","authors":"Shailja Agrawal;K. V. Sushena Sree;Prasad Krishnan;Abhinav Vaishya;Srikar Kale","doi":"10.1109/JSAIT.2023.3320068","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3320068","url":null,"abstract":"We consider the standard broadcast setup with a single server broadcasting information to a number of clients, each of which contains local storage (called \u0000<italic>cache</i>\u0000) of some size, which can store some parts of the available files at the server. The centralized coded caching framework, consists of a caching phase and a delivery phase, both of which are carefully designed in order to use the cache and the channel together optimally. In prior literature, various combinatorial structures have been used to construct coded caching schemes. One of the chief drawbacks of many of these existing constructions is the large subpacketization level, which denotes the number of times a file should be split for the schemes to provide coding gain. In this work, using a new binary matrix model, we present several novel constructions for coded caching based on the various types of combinatorial designs and their \u0000<inline-formula> <tex-math>$q$ </tex-math></inline-formula>\u0000-analogs, which are also called subspace designs. While most of the schemes constructed in this work (based on existing designs) have a high cache requirement, they provide a rate that is either constant or decreasing, and moreover require competitively small levels of subpacketization, which is an extremely important feature in practical applications of coded caching. We also apply our constructions to the distributed computing framework of MapReduce, which consists of three phases, the Map phase, the Shuffle phase and the Reduce phase. Using our binary matrix framework, we present a new simple generic coded data shuffling scheme. Employing our designs-based constructions in conjunction with this new shuffling scheme, we obtain new coded computing schemes which have low file complexity, with marginally higher communication load compared to the optimal scheme for equivalent parameters. We show that our schemes can neatly extend to the scenario with full and partial stragglers also.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"551-568"},"PeriodicalIF":0.0,"publicationDate":"2023-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134795149","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Revisit of Linear Network Error Correction Coding","authors":"Xuan Guang;Raymond W. Yeung","doi":"10.1109/JSAIT.2023.3317941","DOIUrl":"https://doi.org/10.1109/JSAIT.2023.3317941","url":null,"abstract":"We consider linear network erro correction (LNEC) coding when errors may occur on the edges of a communication network of which the topology is known. In this paper, we first present a framework of additive adversarial network for LNEC coding, and then prove the equivalence of two well-known LNEC coding approaches, which can be unified under this framework. Furthermore, by developing a graph-theoretic approach, we obtain a significantly enhanced characterization of the error correction capability of LNEC codes in terms of the minimum distances at the sink nodes. Specifically, in order to ensure that an LNEC code can correct up to \u0000<inline-formula> <tex-math>$r$ </tex-math></inline-formula>\u0000 errors at a sink node \u0000<inline-formula> <tex-math>$t$ </tex-math></inline-formula>\u0000, it suffices to ensure that this code can correct every error vector in a reduced set of error vectors; and on the other hand, this LNEC code in fact can correct every error vector in an enlarged set of error vectors. In general, the size of this reduced set is considerably smaller than the number of error vectors with Hamming weight not larger than \u0000<inline-formula> <tex-math>$r$ </tex-math></inline-formula>\u0000, and the size of this enlarged set is considerably larger than the same number. This result has the important implication that the computational complexities for decoding and for code construction can be significantly reduced.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"4 ","pages":"514-523"},"PeriodicalIF":0.0,"publicationDate":"2023-09-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50354743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}