{"title":"SCRaM – State-Consistent Replication Management for Networked Control Systems","authors":"Ben W. Carabelli, Frank Dürr, K. Rothermel","doi":"10.1109/ICCPS48487.2020.00035","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00035","url":null,"abstract":"Networked control systems (NCS) consist of sensors and actuators that are connected to a controller through a packet-switched network in a feedback loop to control physical systems in diverse application areas such as industry, automotive, or power infrastructure. The control of critical real-time systems places strong requirements on the latency and reliability of both the communication network and the controller. In this paper, we consider the problem of increasing the reliability of an NCS subject to crash failures and message loss by replicating the controller component. Previous replication schemes for real-time systems have focused on ensuring that no conflicting values are sent to the actuators by different replicas. Since this property, which we call output consistency, only refers to the values within one time step, it is insufficient for reasoning about the formal conditions under which a group of replicated controllers behaves equivalent to a non-replicated controller. Therefore, we propose the stronger state consistency property, which ensures that the sequence of values produced by the replicated controller exhibits the same dynamical behaviour as a non-replicated controller. Moreover, we present SCRaM, a protocol for replicating generic periodically sampled controllers that satisfies both of these consistency requirements. To demonstrate the effectiveness of our approach, we evaluated it experimentally for the control of a cart-driven inverted pendulum.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117203898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
D. Sahabandu, Joey Allen, Shana Moothedath, L. Bushnell, Wenke Lee, R. Poovendran
{"title":"Quickest Detection of Advanced Persistent Threats: A Semi-Markov Game Approach","authors":"D. Sahabandu, Joey Allen, Shana Moothedath, L. Bushnell, Wenke Lee, R. Poovendran","doi":"10.1109/ICCPS48487.2020.00009","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00009","url":null,"abstract":"Advanced Persistent Threats (APTs) are stealthy, sophisticated, long-term, multi-stage attacks that threaten the security of sensitive information. Dynamic Information Flow Tracking (DIFT) has been proposed as a promising mechanism to detect and prevent various cyber attacks in computer systems. DIFT tracks suspicious information flows in the system and generates security analysis when anomalous behavior is detected. The number of information flows in a system is typically large and the amount of resources (such as memory, processing power and storage) required for analyzing different flows at different system locations varies. Hence, efficient use of resources is essential to maintain an acceptable level of system performance when using DIFT. On the other hand, the quickest detection of APTs is crucial as APTs are persistent and the damage caused to the system is more when the attacker spends more time in the system. We address the problem of detecting APTs and model the trade-off between resource efficiency and quickest detection of APTs. We propose a game model that captures the interaction of APT and a DIFT-based defender as a two-player, multi-stage, zero-sum, Stackelberg semi-Markov game. Our game considers the performance parameters such as false-negatives generated by DIFT and the time required for executing various operations in the system. We propose a two-time scale Q-learning algorithm that converges to a Stackelberg equilibrium under infinite horizon, limiting average payoff criteria. We validate our model and algorithm on a real-word attack dataset obtained using Refinable Attack INvestigation (RAIN) framework.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128476675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICCPS 2020 Commentary","authors":"","doi":"10.1109/iccps48487.2020.00002","DOIUrl":"https://doi.org/10.1109/iccps48487.2020.00002","url":null,"abstract":"","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"66 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130425061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ICCPS 2020 Index","authors":"","doi":"10.1109/iccps48487.2020.00038","DOIUrl":"https://doi.org/10.1109/iccps48487.2020.00038","url":null,"abstract":"","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133595681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Qitong Gao, Michael Naumann, Ilija Jovanov, Vuk Lesi, Karthik Kumaravelu, W. Grill, M. Pajic
{"title":"Model-Based Design of Closed Loop Deep Brain Stimulation Controller using Reinforcement Learning","authors":"Qitong Gao, Michael Naumann, Ilija Jovanov, Vuk Lesi, Karthik Kumaravelu, W. Grill, M. Pajic","doi":"10.1109/ICCPS48487.2020.00018","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00018","url":null,"abstract":"Parkinson’s disease (PD) currently Influences around one million people in the US. Deep brain stimulation (DBS) is a surgical treatment for the motor symptoms of PD that delivers electrical stimulation to the basal ganglia (BG) region of the brain. Existing commercial DBS devices employ stimulation based only on fixed-frequency periodic pulses. While such periodic high-frequency DBS controllers provide effective relief of PD symptoms, they are very inefficient in terms of energy consumption, and the lifetime of these battery- operated devices is limited to 4 years. Furthermore, fixed high- frequency stimulation may have side effects, such as speech impairment. Consequently, there is a need to move beyond (1) fixed stimulation pulse controllers, and (2) ‘one-size-fits- all’ patient-agnostic treatments, to provide energy efficient and effective (in terms of relieving PD symptoms) DBS controllers. In this work, we introduce a deep reinforcement learning (RL)- based approach that can derive patient-specific DBS patterns that are both effective in reducing a model-based proxy for PD symptoms, as well as energy-efficient. Specifically, we model the BG regions as a Markov decision process (MDP), and define the state and action space as state of the neurons in the BG regions and the stimulation patterns, respectively. Thereafter, we define the reward functions over the state space, and the learning objective is set to maximize the accumulated reward over a finite horizon (i.e., the treatment duration), while bounding average stimulation frequency. We evaluate the performance of our methodology using a Brain-on-Chip (BoC) FPGA platform that implements the physiologically-relevant basal ganglia model (BGM). We show that our RL-based DBS controllers significantly outperform existing fixed frequency controllers in terms of energy efficiency (e.g., by using 70% less energy than common periodic controllers), while providing suitable reduction of model-based proxy for PD symptoms.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127139962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Transfer Reinforcement Learning under Unobserved Contextual Information","authors":"Yan Zhang, M. Zavlanos","doi":"10.1109/ICCPS48487.2020.00015","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00015","url":null,"abstract":"In this paper, we study a transfer reinforcement learning problem where the state transitions and rewards are affected by the environmental context. Specifically, we consider a demonstrator agent that has access to a context-aware policy and can generate transition and reward data based on that policy. These data constitute the experience of the demonstrator. Then, the goal is to transfer this experience, excluding the underlying contextual information, to a learner agent that does not have access to the environmental context, so that they can learn a control policy using fewer samples. It is well known that, disregarding the causal effect of the contextual information, can introduce bias in the transition and reward models estimated by the learner, resulting in a learned suboptimal policy. To address this challenge, in this paper, we develop a method to obtain causal bounds on the transition and reward functions using the demonstrator’s data, which we then use to obtain causal bounds on the value functions. Using these value function bounds, we propose new Q learning and UCB-Q learning algorithms that converge to the true value function without bias. We provide numerical experiments for robot motion planning problems that validate the proposed value function bounds and demonstrate that the proposed algorithms can effectively make use of the data from the demonstrator to accelerate the learning process of the learner.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133064497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Abolfazl Lavaei, F. Somenzi, S. Soudjani, Ashutosh Trivedi, Majid Zamani
{"title":"Formal Controller Synthesis for Continuous-Space MDPs via Model-Free Reinforcement Learning","authors":"Abolfazl Lavaei, F. Somenzi, S. Soudjani, Ashutosh Trivedi, Majid Zamani","doi":"10.1109/ICCPS48487.2020.00017","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00017","url":null,"abstract":"A novel reinforcement learning scheme to synthesize policies for continuous-space Markov decision processes (MDPs) is proposed. This scheme enables one to apply model-free, off-the- shelf reinforcement learning algorithms for finite MDPs to compute optimal strategies for the corresponding continuous-space MDPs without explicitly constructing the finite-state abstraction. The proposed approach is based on abstracting the system with a finite MDP (without constructing it explicitly) with unknown transition probabilities, synthesizing strategies over the abstract MDP, and then mapping the results back over the concrete continuous-space MDP with approximate optimality guarantees. The properties of interest for the system belong to a fragment of linear temporal logic, known as syntactically co-safe linear temporal logic (scLTL), and the synthesis requirement is to maximize the probability of satisfaction within a given bounded time horizon. A key contribution of the paper is to leverage the classical convergence results for reinforcement learning on finite MDPs and provide control strategies maximizing the probability of satisfaction over unknown, continuous-space MDPs while providing probabilistic closeness guarantees. Automata-based reward functions are often sparse; we present a novel potential- based reward shaping technique to produce dense rewards to speed up learning. The effectiveness of the proposed approach is demonstrated by applying it to three physical benchmarks concerning the regulation of a room’s temperature, control of a road traffic cell, and of a 7-dimensional nonlinear model of a BMW 320i car.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129394754","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Socially-Aware Robot Planning via Bandit Human Feedback","authors":"Xusheng Luo, Yan Zhang, M. Zavlanos","doi":"10.1109/ICCPS48487.2020.00033","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00033","url":null,"abstract":"In this paper, we consider the problem of designing collision-free, dynamically feasible, and socially-aware trajectories for robots operating in environments populated by humans. We define trajectories to be social-aware if they do not interfere with humans in any way that causes discomfort. In this paper, discomfort is defined broadly and, depending on specific individuals, it can result from the robot being too close to a human or from interfering with human sight or tasks. Moreover, we assume that human feedback is a bandit feedback indicating a complaint or no complaint on the part of the robot trajectory that interferes with the humans, and it does not reveal any contextual information about the locations of the humans or the reason for a complaint. Finally, we assume that humans can move in the obstacle-free space and, as a result, human utility can change. We formulate this planning problem as an online optimization problem that minimizes the social value of the time-varying robot trajectory, defined by the total number of incurred human complaints. As the human utility is unknown, we employ zeroth order, or derivative-free, optimization methods to solve this problem, which we combine with off-the-shelf motion planners to satisfy the dynamic feasibility and collision-free specifications of the resulting trajectories. To the best of our knowledge, this is a new framework for socially-aware robot planning that is not restricted to avoiding collisions with humans but, instead, focuses on increasing the social value of the robot trajectories using only bandit human feedback.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130778804","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Real-time Out-of-distribution Detection in Learning-Enabled Cyber-Physical Systems","authors":"Feiyang Cai, X. Koutsoukos","doi":"10.1109/ICCPS48487.2020.00024","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00024","url":null,"abstract":"Cyber-physical systems (CPS) greatly benefit by using machine learning components that can handle the uncertainty and variability of the real-world. Typical components such as deep neural networks, however, introduce new types of hazards that may impact system safety. The system behavior depends on data that are available only during runtime and may be different than the data used for training. Out-of-distribution data may lead to a large error and compromise safety. The paper considers the problem of efficiently detecting out-of-distribution data in CPS control systems. Detection must be robust and limit the number of false alarms while being computational efficient for real-time monitoring. The proposed approach leverages inductive conformal prediction and anomaly detection for developing a method that has a well-calibrated false alarm rate. We use variational autoencoders and deep support vector data description to learn models that can be used efficiently compute the nonconformity of new inputs relative to the training set and enable realtime detection of out-of-distribution high-dimensional inputs. We demonstrate the method using an advanced emergency braking system and a self-driving end-to-end controller implemented in an open source simulator for self-driving cars. The simulation results show very small number of false positives and detection delay while the execution time is comparable to the execution time of the original machine learning components.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132538209","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Luyao Niu, B. Ramasubramanian, Andrew Clark, L. Bushnell, R. Poovendran
{"title":"Control Synthesis for Cyber-Physical Systems to Satisfy Metric Interval Temporal Logic Objectives under Timing and Actuator Attacks*","authors":"Luyao Niu, B. Ramasubramanian, Andrew Clark, L. Bushnell, R. Poovendran","doi":"10.1109/ICCPS48487.2020.00023","DOIUrl":"https://doi.org/10.1109/ICCPS48487.2020.00023","url":null,"abstract":"This paper studies the synthesis of controllers for cyber-physical systems (CPSs) that are required to carry out complex tasks that are time-sensitive, in the presence of an adversary. The task is specified as a formula in metric interval temporal logic (MITL). The adversary is assumed to have the ability to tamper with the control input to the CPS and also manipulate timing information perceived by the CPS. In order to model the interaction between the CPS and the adversary, and also the effect of these two classes of attacks, we define an entity called a durational stochastic game (DSG). DSGs probabilistically capture transitions between states in the environment, and also the time taken for these transitions. With the policy of the defender represented as a finite state controller (FSC), we present a value-iteration based algorithm that computes an FSC that maximizes the probability of satisfying the MITL specification under the two classes of attacks. A numerical case-study on a signalized traffic network is presented to illustrate our results.","PeriodicalId":158690,"journal":{"name":"2020 ACM/IEEE 11th International Conference on Cyber-Physical Systems (ICCPS)","volume":"1989-2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2020-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129751063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}