Jayati Singh, Ignacio Sañudo Olmedo, Nicola Capodieci, A. Marongiu, M. Caccamo
{"title":"Reconciling QoS and Concurrency in NVIDIA GPUs via Warp-Level Scheduling","authors":"Jayati Singh, Ignacio Sañudo Olmedo, Nicola Capodieci, A. Marongiu, M. Caccamo","doi":"10.23919/DATE54114.2022.9774761","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774761","url":null,"abstract":"The widespread deployment of NVIDIA GPUs in latency-sensitive systems today requires predictable GPU multi-tasking, which cannot be trivially achieved. The NVIDIA CUDA API allows programmers to easily exploit the processing power provided by these massively parallel accelerators and is one of the major reasons behind their ubiquity. However, NVIDIA GPUs and the CUDA programming model favor throughput instead of latency and timing predictability. Hence, providing real-time and quality-of-service (QoS) properties to GPU applications presents an interesting research challenge. Such a challenge is paramount when considering simultaneous multikernel (SMK) scenarios, wherein kernels are executed concurrently within each streaming multiprocessor (SM). In this work, we explore QoS-based fine-grained multitasking in SMK via job arbitration at the lowest level of the GPU scheduling hierarchy, i.e., between warps. We present QoS-aware warp scheduling (QAWS) and evaluate it against state-of-the-art, kernel-agnostic policies seen in NVIDIA hardware today. Since the NVIDIA ecosystem lacks a mechanism to specify and enforce kernel priority at the warp granularity, we implement and evaluate our proposed warp scheduling policy on GPGPU-Sim. QAWS not only improves the response time of the higher priority tasks but also has comparable or better throughput than the state-of-the-art policies.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128844354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Remote Sensing with UAV and Mobile Recharging Vehicle Rendezvous","authors":"M. Ostertag, Jason Ma, T. Simunic","doi":"10.23919/DATE54114.2022.9774645","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774645","url":null,"abstract":"Small unmanned aerial vehicles (UAVs) equipped with sensors offer an effective way to perform high-resolution environmental monitoring in remote areas but suffer from limited battery life. In order to perform large-scale remote sensing, a UAV must cover the area using multiple discharge cycles. A practical and efficient method to achieve full coverage is for the sensing UAV to rendezvous with a mobile recharge vehicle (MRV) for a battery exchange, which is an NP-hard problem. Existing works tackle this problem using slow genetic algorithms or greedy heuristics. We propose an alternative approach: a two-stage algorithm that iterates between dividing a region into independent subregions aligned to MRV travel and a new diffusion heuristic that performs a local exchange of points of interest between neighboring subregions. The algorithm outperforms existing state-of-the-art planners for remote sensing applications, creating more fuel efficient paths that better align with MRV travel.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126976874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Parya Zolfaghari, Joel Ortiz, C. Killian, S. L. Beux
{"title":"Non-Volatile Phase Change Material based Nanophotonic Interconnect","authors":"Parya Zolfaghari, Joel Ortiz, C. Killian, S. L. Beux","doi":"10.23919/DATE54114.2022.9774648","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774648","url":null,"abstract":"Integrated optics is a promising technology to take advantage of light propagation for high throughput chip-scale interconnects in many core architectures. A key challenge for the deployment of nanophotonic interconnects is their high static power, which is induced by signal losses and devices calibration. To tackle this challenge, we propose to use Phase Change Material (PCM) to configure optical paths between writers and readers. The non-volatility of PCM elements and the high contrast between crystalline and amorphous phase states allow to bypass unused readers, thus reducing losses and calibration requirements. We evaluate the efficiency of the proposed PCM-based interconnects using system level simulations carried out with SNIPER manycore simulator. For this purpose, we have modified the simulator to partition clusters according to executed applications. Simulation results show that bypassing readers using PCM leads up to 52% communication power saving.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"137 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115958449","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bernhard Lippmann, A. Bette, Matthias Ludwig, Johannes Mutter, Johanna Baehr, Alexander Hepp, H. Gieser, Nicola Kovač, Tobias Zweifel, M. Rasche, Oliver Kellermann
{"title":"Physical and Functional Reverse Engineering Challenges for Advanced Semiconductor Solutions","authors":"Bernhard Lippmann, A. Bette, Matthias Ludwig, Johannes Mutter, Johanna Baehr, Alexander Hepp, H. Gieser, Nicola Kovač, Tobias Zweifel, M. Rasche, Oliver Kellermann","doi":"10.23919/DATE54114.2022.9774610","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774610","url":null,"abstract":"Motivated by the threats of malicious modification and piracy arising from worldwide distributed supply chains, the goal of RESEC is the creation, verification, and optimization of a complete reverse engineering process for integrated circuits manufactured in technology nodes of 40nm and below. Building upon the presentation of individual reverse engineering process stages, this paper connects analysis efforts and yields with their impact on hardware security, demonstrated on a design with implemented experimental hardware Trojans. We outline the interim stage of our research activities and present our future targets linking chip design and physical verification processes.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126005812","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniela Kaufmann, P. Beame, Armin Biere, J. Nordström
{"title":"Adding Dual Variables to Algebraic Reasoning for Gate-Level Multiplier Verification","authors":"Daniela Kaufmann, P. Beame, Armin Biere, J. Nordström","doi":"10.23919/DATE54114.2022.9774587","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774587","url":null,"abstract":"Algebraic reasoning has proven to be one of the most effective approaches for verifying gate-level integer mul-tipliers, but it struggles with certain components, necessitating the complementary use of SAT solvers. For this reason validation certificates require proofs in two different formats. Approaches to unify the certificates are not scalable, meaning that the validation results can only be trusted up to the correctness of compositional reasoning. We show in this paper that using dual variables in the algebraic encoding, together with a novel tail substitution and carry rewriting method, removes the need for SAT solvers in the verification flow and yields a single, uniform proof certificate.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125188131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kyler R. Scott, Cheng-Yen Lee, S. Khatri, S. Vrudhula
{"title":"A Flash-based Current-mode IC to Realize Quantized Neural Networks","authors":"Kyler R. Scott, Cheng-Yen Lee, S. Khatri, S. Vrudhula","doi":"10.23919/DATE54114.2022.9774539","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774539","url":null,"abstract":"This paper presents a mixed-signal architecture for implementing Quantized Neural Networks (QNNs) using flash transistors to achieve extremely high throughput with extremely low power, energy and memory requirements. Its low resource consumption makes our design especially suited for use in edge devices. The network weights are stored in-memory using flash transistors, and nodes perform operations in the analog current domain. Our design can be programmed with any QNN whose hyperparameters (the number of layers, filters, or filter size, etc) do not exceed the maximum provisioned. Once the flash devices are programmed with a trained model and the IC is given an input, our architecture performs inference with zero access to off-chip memory. We demonstrate the robustness of our design under current-mode non-linearities arising from process and voltage variations. We test validation accuracy on the ImageNet dataset, and show that our IC suffers only 0.6% and 1.0% reduction in classification accuracy for Top-1 and Top-5 outputs, respectively. Our implementation results in a $sim boldsymbol {50}times$ reduction in latency and energy when compared to a recently published mixed-signal ASIC implementation, with similar power characteristics. Our approach provides layer partitioning and node sharing possibilities, which allow us to trade off latency, power, and area amongst each other.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"46 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126524387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Cristian Turetta, Florenc Demrozi, Philipp H. Kindt, Alejandro Masrur, G. Pravadelli
{"title":"Practical identity recognition using WiFi's Channel State Information","authors":"Cristian Turetta, Florenc Demrozi, Philipp H. Kindt, Alejandro Masrur, G. Pravadelli","doi":"10.23919/DATE54114.2022.9774772","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774772","url":null,"abstract":"Identity recognition is increasingly used to control access to sensitive data, restricted areas in industrial, healthcare, and defense settings, as well as in consumer electronics. To this end, existing approaches are typically based on collecting and analyzing biometric data and imply severe privacy con-cerns. Particularly when cameras are involved, users might even reject or dismiss an identity recognition system. Furthermore, iris or fingerprint scanners, cameras, microphones, etc., imply installation and maintenance costs and require the user's active participation in the recognition procedure. This paper proposes a non-intrusive identity recognition system based on analyzing WiFi's Channel State Information (CSI). We show that CSI data attenuated by a person's body and typical movements allows for a reliable identification - even in a sitting posture. We further propose a lightweight deep learning algorithm trained using CSI data, which we implemented and evaluated on an embedded platform (i.e., a Raspberry Pi 4B). Our results obtained using real-world experiments suggest a high accuracy in recognizing people's identity, with a specificity of 98% and a sensitivity of 99%, while requiring a low training effort and negligible cost.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124424361","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shirui Zhao, Nimish Shah, Wannes Meert, M. Verhelst
{"title":"Discrete Samplers for Approximate Inference in Probabilistic Machine Learning","authors":"Shirui Zhao, Nimish Shah, Wannes Meert, M. Verhelst","doi":"10.23919/DATE54114.2022.9774623","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774623","url":null,"abstract":"Probabilistic reasoning models (PMs) and probabilistic inference bring advantages when dealing with small datasets or uncertainty on the observed data, and allow to integrate expert knowledge and create interpretable models. The main challenge of using these PMs in practice is that their inference is very compute-intensive. Therefore, custom hardware architectures for the exact and approximate inference of PMs have been proposed in the SotA. The throughput, energy and area efficiency of approximate PM inference accelerators are strongly dominated by the sampler blocks required to sample arbitrary discrete distributions. This paper proposes and studies novel discrete sampler architectures towards efficient and flexible hardware implementations for PM accelerators. Both cumulative distribution table (CDT) and Knuth-Yao (KY) based sampling algorithms are assessed, based on which different sampler hardware architectures were implemented. Innovation is brought in terms of a reconfigurable CDT sampling architecture with a flexible range and a reconfigurable Knuth-Yao sampling architecture that supports both flexible range and dynamic precision. All architectures are benchmarked on real-world Bayesian Networks, demonstrating up to 13 × energy efficiency benefits and 11 × area efficiency improvement of the optimized reconfigurable Knuth-Yao sampler over the traditional linear CDT-based samplers used in the PM SotA.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"87 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127452615","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Space and Power Reduction in BDD-based Optical Logic Circuits Exploiting Dual Ports","authors":"R. Matsuo, S. Minato","doi":"10.23919/DATE54114.2022.9774637","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774637","url":null,"abstract":"Optical logic circuits based on integrated nanophotonics have attracted significant interest due to their ultra-high-speed operation. A synthesis method based on the Binary Decision Diagram (BDD) has been studied, as BDD-based optical logic circuits can take advantage of the speed of light. However, a fundamental disadvantage of BDD-based optical logic circuits is a large number of splitters, which results in large power consumption. In BDD-based circuits a dual port of each logic gate is not used. We propose a method for eliminating a splitter exploiting this dual port. We define a BDD node corresponding to a dual port as a dual port node (DP node) and call the proposed method DP node sharing. We demonstrated that DP node sharing significantly reduces the power consumption and to a lesser extent circuit size without increasing delay. We conducted an experiment involving 10-input logic functions obtained by applying an LUT technology mapper to an ISCSA'85 C7552 benchmark circuit to evaluate our DP node sharing. The experimental results demonstrated that DP node sharing reduces the power consumption by two orders of magnitude of circuit that consume a large amount of power.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130378053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards Approximate Computing for Achieving Energy vs. Accuracy Trade-offs","authors":"A. Ometov, J. Nurmi","doi":"10.23919/DATE54114.2022.9774538","DOIUrl":"https://doi.org/10.23919/DATE54114.2022.9774538","url":null,"abstract":"Despite the recent advances in semiconductor technology and energy-aware system design, the overall energy consumption of computing and communication systems is rapidly growing. On the one hand, the pervasiveness of these technologies everywhere in the form of mobile devices, cyber-physical embedded systems, sensor networks, wearables, social media and context-awareness, intelligent machines, broadband cellular networks, Cloud computing, and Internet of Things (IoT) has drastically increased the demand for computing and communications. On the other hand, the user expectations on features and battery life of online devices are increasing all the time, and it creates another incentive for finding good trade-offs between performance and energy consumption. One of the opportunities to address this growing demand is to utilize an Approximate Computing approach through software and hardware design. The APROPOS project aims at finding the balance between accuracy and energy consumption, and this short paper provides an initial overview of the corresponding roadmap, as the project is still in the initial stage.","PeriodicalId":232583,"journal":{"name":"2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"56 3 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116535695","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}