Z. He, Tahereh Miari, Hosein Mohammadi Makrani, Mehrdad Aliasgari, H. Homayoun, H. Sayadi
{"title":"When Machine Learning Meets Hardware Cybersecurity: Delving into Accurate Zero-Day Malware Detection","authors":"Z. He, Tahereh Miari, Hosein Mohammadi Makrani, Mehrdad Aliasgari, H. Homayoun, H. Sayadi","doi":"10.1109/ISQED51717.2021.9424330","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424330","url":null,"abstract":"Cybersecurity for the past decades has been in the front line of global attention as a critical threat to the information technology infrastructures. According to recent security reports, malicious software (a.k.a. malware) is rising at an alarming rate in numbers as well as harmful purposes to compromise security of computing systems. To address the high complexity and computational overheads of conventional software-based detection techniques, Hardware-Supported Malware Detection (HMD) has proved to be efficient for detecting malware at the processors’ microarchitecture level with the aid of Machine Learning (ML) techniques applied on Hardware Performance Counter (HPC) data. Existing ML-based HMDs while accurate in recognizing known signatures of malicious patterns, have not explored detecting unknown (zero-day) malware data at run-time which is a more challenging problem, since its HPC data does not match any known attack applications’ signatures in the existing database. In this work, we first present a review of recent ML-based HMDs utilizing built-in HPC registers information. Next, we examine the suitability of various standard ML classifiers for zero-day malware detection and demonstrate that such methods are not capable of detecting unknown malware signatures with high detection rate. Lastly, to address the challenge of run-time zero-day malware detection, we propose an ensemble learning-based technique to enhance the performance of the standard malware detectors despite using a small number of microarchitectural features that are captured at run-time by existing HPCs. The experimental results demonstrate that our proposed approach by applying AdaBoost ensemble learning on Random Forrest classifier as a regular classifier achieves 92% F-measure and 95% TPR with only 2% false positive rate in detecting zero-day malware using only the top 4 microarchitectural features.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124002915","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Khitam M. Alatoun, B. Shankaranarayanan, Shanmukha Murali Achyutha, R. Vemuri
{"title":"SoC Trust Validation Using Assertion-Based Security Monitors","authors":"Khitam M. Alatoun, B. Shankaranarayanan, Shanmukha Murali Achyutha, R. Vemuri","doi":"10.1109/ISQED51717.2021.9424363","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424363","url":null,"abstract":"Modern SoC applications include a variety of sensitive modules in which data must be protected against malicious access. Security vulnerabilities, when exercised during the SoC operation, lead to denial of service or disclosure of protected data. Hence, it is essential to undertake security validation before and after SoC fabrication and make provisions for continuous security assessment during operation. This paper presents a methodology for optimized post-deployment monitoring of SoC’s security properties by migrating pre-fab design security assertions to post-fab run-time security monitors. We show that the method is scalable for large systems and complex properties by optimizing the hardware monitors and applying it to a large SoC design based on a OpenRISC-1200 SoC. About 40 security assertions were specified in System Verilog Assertions (SVA). Following formal verification, the assertions were synthesized into finite state machines and cross optimized. Following code generation in Verilog, commercial logic and layout synthesis tools were used to generate hardware monitors which were then integrated with the SoC design ready for fabrication.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"755 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116116236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"TRGP: A Low-Cost Re-Configurable TRNG-PUF Architecture for IoT","authors":"V. Rai, S. Tripathy, J. Mathew","doi":"10.1109/ISQED51717.2021.9424347","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424347","url":null,"abstract":"Internet of Things (IoT) devices have made their presence felt across the domain, society, and individuals. These technologies have played a prominent role in shaping the digital world. However, they bring their own set of security and privacy threats. These devices being resource-constraint, are unable to mitigate the challenges with the existing traditional solutions. Various software and hardware solutions have been tuned for IoT security. Physical unclonable function (PUF) and random number generator (RNG) are the most useful for building security applications, especially in the resource restrained devices like IoT. The security protocols, including identification, authentication, and key-agreement, can be developed using PUFs, while RNGs can produce ephemeral keys and nonces. TRNG and PUF as a reconfigurable circuit, reduce the hardware cost and prove to be an effective solution. A memristor being the fundamental circuit element offers certain unique advantages, including the possibility of hybridization with complementary metal-oxide-semiconductor (CMOS) circuits. This paper proposes a low-cost re-configurable TRNG-PUF named (TRGP), which can harness the benefits of both the TRNG and PUF. We evaluate their performance against various parameters and compare them with some of the exiting PUF and TRNG architectures.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"24 2","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133169541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hongwu Peng, Shaoyi Huang, Tong Geng, Ang Li, Weiwen Jiang, Hang Liu, Shusen Wang, Caiwen Ding
{"title":"Accelerating Transformer-based Deep Learning Models on FPGAs using Column Balanced Block Pruning","authors":"Hongwu Peng, Shaoyi Huang, Tong Geng, Ang Li, Weiwen Jiang, Hang Liu, Shusen Wang, Caiwen Ding","doi":"10.1109/ISQED51717.2021.9424344","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424344","url":null,"abstract":"Although Transformer-based language representations achieve state-of-the-art accuracy on various natural language processing (NLP) tasks, the large model size has been challenging the resource constrained computing platforms. Weight pruning, as a popular and effective technique in reducing the number of weight parameters and accelerating the Transformer, has been investigated on GPUs. However, the Transformer acceleration using weight pruning on field-programmable gate array (FPGAs) remains unexplored. This paper investigates the column balanced block-wise pruning on Transformer and designs an FPGA acceleration engine to customize the balanced blockwise matrix multiplication. We implement the Transformer model with proper hardware scheduling, and the experiments show that the Transformer inference on FPGA achieves 10.35 ms latency with the batch size of 32, which is $10.96 times$ speed up comparing to CPU platform and $2.08 times$ speed up comparing to GPU platform.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"294 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117353822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Formal Verification Aware Redundant Sequential Logic Optimization to Improve Design Utilization","authors":"Rushabh Shah, Krishna Agrawal","doi":"10.1109/ISQED51717.2021.9424336","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424336","url":null,"abstract":"With continuous advancement in technology, semiconductor industry is moving towards lower process nodes to improve transistor density, performance, and power optimization. For lower nodes, fabrication gets costlier and area reduction is of prime importance. To align with this goal, while doing physical implementation one of the key targets is to synthesize design with most optimal logic and less redundant functional logic. Even though synthesis tools are optimized to align with customers target, there are limitations. Identification of such redundant logic is possible both in synthesis and formal verification tools. This paper presents novel algorithm and process to identify redundant logic using Formal Verification tool and use this data to generate ECO such that synthesis tool can optimize logic better than current known methods. Using proposed solution, 1K to 38K reduction in sequential cell count and 4K to 85K overall cell count reduction has been observed for various design cases. This solution provides logic area and power saving without compromising on design testability and formal verification at the cost of runtime increase.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"29 51","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120811031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep Learning assisted Cross-Family Profiled Side-Channel Attacks using Transfer Learning","authors":"Dhruv Thapar, Manaar Alam, Debdeep Mukhopadhyay","doi":"10.1109/ISQED51717.2021.9424254","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424254","url":null,"abstract":"Side-channel analysis (SCA) utilizing the power consumption of a device has proved to be an efficient technique for recovering secret keys exploiting the implementation vulnerability of mathematically secure cryptographic algorithms. Recently, Deep Learning-based profiled SCA (DL-SCA) has gained popularity, where an adversary trains a deep learning model using profiled traces obtained from a dummy device (a device that is similar to the target device) and uses the trained model to retrieve the secret key from the target device. However, for efficient key recovery from the target device, training of such a model requires a large number of profiled traces from the dummy device and extensive training time. In this paper, we propose TranSCA, a new DL-SCA strategy that tries to address the issue. TranSCA works in three steps – an adversary (1) performs a one-time training of a base model using profiled traces from any device, (2) fine-tunes the parameters of the base model using significantly less profiled traces from a dummy device with the aid of transfer learning strategy in lesser time than training from scratch, and (3) uses the fine-tuned model to attack the target device. We validate TranSCA on simulated power traces created to represent different FPGA families. Experimental results show that the transfer learning strategy makes it possible to attack a new device from the knowledge of another device even if the new device belongs to a different family. Also, TranSCA requires very few power traces from the dummy device compared to when applying DL-SCA without any previous knowledge.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126994742","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Error Resilient Design Platform for Aggressively Reducing Power, Area and Routing Congestion","authors":"Tung-Liang Lin, Sao-Jie Chen","doi":"10.1109/ISQED51717.2021.9424298","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424298","url":null,"abstract":"In the traditional implementation methodology, a range of target voltage levels as defined in the Unified Power Format (UPF) together with the regular timing constraints are applied during the timing, area and power optimization stages of RTL-to-gate mapping. However, this approach usually requires stronger-driving-strength and bigger-size combinational and sequential standard cell mapping for maintaining the degraded performance caused by lower supply voltage. In this paper, an innovative power-saving design platform, using an analysis flow that effectively integrates the following methodologies, was proposed: (1) path retiming, slack redistribution, and modified razor insertions; (2) customized vector-free approaches and the automation procedures of generating corresponding and randomized stimulus for early-stage static and dynamic voltage-aware power analysis; and (3) precise prediction via Design Dependent Critical Path Monitor (DDCPM) for avoiding the happening of unexpected timing violations caused by the aggressive scaling of supply voltage during the fine-grained DVFS. Accordingly, not only dramatic reductions of power consumption and chip area but the serious routing congestion issues often happened in a design with high occupation of long-depth critical timing paths could also be effectively alleviated. One of our experimental results in TSMC 55nm process node shows the maximum power and area reduction is 62.7% and 29.1%, respectively.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128672396","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sabine Pircher, J. Geier, A. Zeh, Daniel Mueller-Gritschneder
{"title":"Exploring the RISC-V Vector Extension for the Classic McEliece Post-Quantum Cryptosystem","authors":"Sabine Pircher, J. Geier, A. Zeh, Daniel Mueller-Gritschneder","doi":"10.1109/ISQED51717.2021.9424273","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424273","url":null,"abstract":"The dawn of quantum computers threatens the security guarantees of classical public-key cryptography. This gave rise to a new class of so-called quantum-resistant cryptography algorithms and a need to efficiently implement them on embedded hardware platforms. This paper investigates how we can exploit the most recent RISC-V Vector Extension Version 0.9 (RVV0.9) to accelerate the quantum-resistant code-based Classic McEliece cryptosystem. We focused on the Gaussian Elimination Algorithm (GEA) that is essential for the key generation of the McEliece scheme. The GEA offers high potential for acceleration by vector instructions of the RVV extension. In order to evaluate the possible gains, we adopted a rapid prototyping approach based on an instruction set simulator (ISS). We extended the simulator ETISS with a SoftVector library, which allows to quickly model the instructions of RVV. Using the rapid prototyping environment, the GEA was re-implemented and verified for RVV0.9.The final performance gain heavily depends on the memory interface of the vector unit. For different configurations of the memory system, we could profile performance gains of 6 up to 18 for the GEA. This clearly shows the benefit of RVV for implementing quantum-resistant cryptosystems.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125683931","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Resistor-less, Nano-Watt CMOS Voltage Reference with High PSRR","authors":"Naveed, J. Dix","doi":"10.1109/ISQED51717.2021.9424289","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424289","url":null,"abstract":"A resistor-less nano-Watt voltage reference circuit using only Metal-Oxide Semiconductor Field Effect Transistors (MOSFETs) is presented. To ensure low-power operation, the circuit is biased in the subthreshold region. The reference voltage is generated by using a temperature dependent current generated from a modified Oguey Current Source to bias a diode-connected MOSFET. The proposed circuit is simulated using a 65 nm Complimentary Metal-Oxide Semiconductor (CMOS) process. The circuit can operate from 0.65 to 2.5 V in the temperature range from -30 to $80^{circ}mathrm{C}$. The circuit achieves a temperature coefficient of 19.3 ppm $/^{circ}mathrm{C}$ while consuming 3.64 nW power at room temperature. The line sensitivity of the circuit is 0.0026 %/V and the power supply rejection ratio (PSRR) is -75 dB at 100 Hz. The voltage reference occupies 0.0063 mm2 of area.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"25 4","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113960726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3D IC Packaging Utilizing a Metal Structure for Heat Reduction, Noise Shielding, and High Interconnect Density","authors":"Nahid Mirzaie, R. Rohrer","doi":"10.1109/ISQED51717.2021.9424326","DOIUrl":"https://doi.org/10.1109/ISQED51717.2021.9424326","url":null,"abstract":"A three-dimensional (3D) microelectronic package structure is proposed to address thermal management by reducing noise and crosstalk. The packaging architecture includes an array of through-silicon-vias (TSVs) located inside an interposer layer with a shielding structure. Each of the TSVs is surrounded by a metal box as a heat spreader and noise/distortion shield. The metal shielding structure is electrically connected to the ground. The boxes also may be further connected to an external heat sink through at least several external contacts. The simulated results show significant improvement in terms of power reduction and signal integrity due to the suppressed coupling noise and crosstalk between adjacent TSVs, thereby reducing heat.","PeriodicalId":123018,"journal":{"name":"2021 22nd International Symposium on Quality Electronic Design (ISQED)","volume":"159 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134052444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}