{"title":"An efficient hardware design for cerebellar models using approximate circuits: special session paper","authors":"Honglan Jiang, Leibo Liu, Jie Han","doi":"10.1145/3125502.3125537","DOIUrl":"https://doi.org/10.1145/3125502.3125537","url":null,"abstract":"The superior controllability of the cerebellum has motivated extensive interest in the development of computational cerebellar models. Many models have been applied to the motor control and image stabilization in robots. Often computationally complex, cerebellar models have rarely been implemented in dedicated hardware. Here, we propose an efficient hardware design for cerebellar models using approximate circuits with a small area and a low power. Leveraging the inherent error tolerance in the cerebellum, approximate adders and multipliers are carefully evaluated for implementations in an adaptive filter based cerebellar model to achieve a good tradeoff in accuracy and hardware usage. A saccade system, whose vestibulo-ocular reflex (VOR) is controlled by the cerebellum, is simulated to show the applicability and effectiveness of the proposed design. Simulation results show that the approximate cerebellar circuit achieves a similar accuracy as an exact implementation, but it saves area by 29.7% and power by 37.3%.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"50 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116757605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hantao Huang, Suleman Khalid Rai, Wenye Liu, Hao Yu
{"title":"A fast online sequential learning accelerator for IoT network intrusion detection: work-in-progress","authors":"Hantao Huang, Suleman Khalid Rai, Wenye Liu, Hao Yu","doi":"10.1145/3125502.3125532","DOIUrl":"https://doi.org/10.1145/3125502.3125532","url":null,"abstract":"Deployment of IoT devices for smart buildings and homes will offer a high level of comfortability with increased energy efficiency; but can also introduce potential cyber-attacks such as network intrusions via linked IoT devices. Due to the low-power and low-latency requirement to secure IoT network, traditional software based security system is not applicable. Instead, an embedded hardware-accelerator based data analytics is more preferred for network intrusion detection. In this paper, we propose an online sequential machine learning hardware accelerator to perform realtime network intrusion detection. A single hidden layer feedforward neural network based learning algorithm is developed with a least-squares solver realized on hardware. Experimental results on a single FPGA achieve a bandwidth of 409.6 Gbps with fast yet low-power network intrusion detection based on a number of benchmarks.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124002904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Heterogeneous redundancy to address performance and cost in multi-core SIMT: work-in-progress","authors":"M. Naghashi, S. H. Mozafari, S. Hessabi","doi":"10.1145/3125502.3125547","DOIUrl":"https://doi.org/10.1145/3125502.3125547","url":null,"abstract":"As manufacturing processes scale to smaller feature sizes and processors become more complex, it is becoming challenging to have fabricated devices that operate according to their specification in the first place: yield losses are mounting [3].","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"99 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123024593","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DOVE: pinpointing firmware security vulnerabilities via symbolic control flow assertion mining (work-in-progress)","authors":"Alessandro Danese, G. Pravadelli, V. Bertacco","doi":"10.1145/3125502.3125541","DOIUrl":"https://doi.org/10.1145/3125502.3125541","url":null,"abstract":"In the past decade, the number of reported security attacks exploiting unchecked input firmware values has been on the rise. To address this concerning trend, this work proposes a novel detection framework, called DOVE, capable of identifying unlikely firmware execution flows, specifically those that may reveal a security vulnerability. The DOVE framework operates by leveraging a symbolic simulation of the firmware's execution, paired with a probability computation that can identify unlikely execution flows and provide to the user corresponding formal assertions.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"12 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121013259","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IR-level annotation strategy dealing with aggressive loop optimizations for performance estimation in native simulation: work-in-progress","authors":"Omayma Matoussi, F. Pétrot","doi":"10.1145/3125502.3125550","DOIUrl":"https://doi.org/10.1145/3125502.3125550","url":null,"abstract":"Originally developed for purely functional verification of software, native or host compiled simulation [6] has gained momentum, thanks to its considerable speedup compared to instruction set simulation (ISS). To obtain a performance model of the software, non-functional information is computed from the target binary code using low-level analysis and back-annotated into the high-level code used to generate it. This annotated functional model is then natively compiled and executed on the host machine for fast software timing [8] estimations. Back-annotating at the right place needs a mapping between the binary instructions and the high-level code statements. So, it is necessary to decide at which stage of the software compilation process the information is back-annotated. There are three possibilities: in the original source code ([7]), in the host binary code ([3]), or in the compiler intermediate representation (IR) ([8], [2]). As compilers perform many optimizations to enhance software performance, the source code and the binary code structures may be radically different. In this work, we define a mapping approach between the compiler's IR and the binary control flow graph (CFG) when a high-level of compiler optimizations (eg. O3 in gcc) is used. Our approach handles aggressive compiler optimizations such as loop unrolling without having to introduce any modification to the compiler.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127298869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hampering fault attacks against lattice-based signature schemes: countermeasures and their efficiency (special session)","authors":"Nina Bindel, Juliane Krämer, Johannes Schreiber","doi":"10.1145/3125502.3125546","DOIUrl":"https://doi.org/10.1145/3125502.3125546","url":null,"abstract":"Research on physical attacks on lattice-based cryptography has seen some progress in recent years and first attacks and countermeasures have been described. In this work, we perform an exhaustive literature review on fault attacks on lattice-based encryption and signature schemes. Based on this, we provide a complete overview of suggested countermeasures and analyze which of the proposed attacks can prevented by respective countermeasures. Moreover, we show for selected countermeasures how they affect the runtime of the protected operations.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121433155","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Trends, challenges and needs for lattice-based cryptography implementations: special session","authors":"Hamid Nejatollahi, N. Dutt, Rosario Cammarota","doi":"10.1145/3125502.3125559","DOIUrl":"https://doi.org/10.1145/3125502.3125559","url":null,"abstract":"Advances in computing steadily erode computer security at its foundation, calling for fundamental innovations to strengthen the weakening cryptographic primitives and security protocols. At the same time, the emergence of new computing paradigms, such as Cloud Computing and Internet of Everything, demand that innovations in security extend beyond their foundational aspects, to the actual design and deployment of these primitives and protocols while satisfying emerging design constraints such as latency, compactness, energy efficiency, and agility. While many alternatives have been proposed for symmetric key cryptography and related protocols (e.g., lightweight ciphers and authenticated encryption), the alternatives for public key cryptography are limited to post-quantum cryptography primitives and their protocols. In particular, lattice-based cryptography is a promising candidate, both in terms of foundational properties, as well as its application to traditional security problems such as key exchange, digital signature, and encryption/decryption. We summarize trends in lattice-based cryptographic schemes, some fundamental recent proposals for the use of lattices in computer security, challenges for their implementation in software and hardware, and emerging needs.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"23 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126500756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bryan Donyanavard, Amir Mahdi Hosseini Monazzah, T. Mück, N. Dutt
{"title":"Exploring fast and slow memories in HMP core types: work-in-progress","authors":"Bryan Donyanavard, Amir Mahdi Hosseini Monazzah, T. Mück, N. Dutt","doi":"10.1145/3125502.3125545","DOIUrl":"https://doi.org/10.1145/3125502.3125545","url":null,"abstract":"Studies have shown memory and computational needs vary independently across applications. Recent work has explored using hybrid memory technology (SRAM+NVM) in on-chip memories of multicore processors (CMPs) to support the varied needs of diverse workloads. Such works suggest architectural modifications that require supplemental management in the memory hierarchy. Instead, we propose to deploy hybrid memory in a manner that integrates seamlessly with the existing heterogeneous multicore (HMP) architectural model, and therefore does not require any architectural modification, simply the integration of different memory technologies on-chip. We evaluate platforms with a combination of fast (SRAM cache) and slow (STT-MRAM cache) core-types for mobile workloads.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115744986","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lei Gong, Chao Wang, Xi Li, Hua-ping Chen, Xuehai Zhou
{"title":"A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress","authors":"Lei Gong, Chao Wang, Xi Li, Hua-ping Chen, Xuehai Zhou","doi":"10.1145/3125502.3125534","DOIUrl":"https://doi.org/10.1145/3125502.3125534","url":null,"abstract":"Recently, FPGAs have been widely used in the implementation of hardware accelerators for Convolutional Neural Networks (CNN), especially on mobile and embedded devices. However, most of these existing accelerators are designed with the same concept as their ASIC counterparts, that is all operations from different CNN layers are mapped to the same hardware units and work in a multiplexed way. Although this approach improves the generality of these accelerators, it does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation, which is even worse on the embedded platforms. In this paper, we propose an FPGA-based CNN accelerator with all the layers mapped to their own on-chip units, and working concurrently as a pipeline. A strategy which can find the optimized paralleling scheme for each layer is proposed to eliminate the pipeline stall and achieve high resource utilization. In addition, a balanced pruning-based method is applied on fully connected (FC) layers to reduce the computational redundancy. As a case study, we implement a widely used CNNs model, LeNet-5, on an embedded FPGA device, Xilinx Zedboard. It can achieve a peak performance of 39.78 GOP/s and the power efficiency with a value 19.6 GOP/s/W which outperforms previous approaches.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129687492","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Pasricha, J. Doppa, K. Chakrabarty, Saideep Tiku, D. Dauwe, Shi Jin, P. Pande
{"title":"Data analytics enables energy-efficiency and robustness: from mobile to manycores, datacenters, and networks (special session paper)","authors":"S. Pasricha, J. Doppa, K. Chakrabarty, Saideep Tiku, D. Dauwe, Shi Jin, P. Pande","doi":"10.1145/3125502.3125560","DOIUrl":"https://doi.org/10.1145/3125502.3125560","url":null,"abstract":"The amount of data generated and collected across computing platforms every day is not only enormous, but growing at an exponential rate. Advanced data analytics and machine-learning techniques have become increasingly essential to analyze and extract meaning from such \"Big Data\". These techniques can be very useful to detect patterns and trends to improve the operational behavior of computing platforms, but they also introduce a number of outstanding challenges: (1) How can we design and deploy data analytics and learning mechanisms to improve energy-efficiency in IoT and mobile devices, without introducing significant software overheads? (2) How to use machine learning and analytics techniques for effective designspace exploration during manycore chip design? (3) How can data analytics and learning improve the reliability and energy-efficiency of large-scale cloud datacenters, to cost-effectively support connected embedded and IoT platforms? (4) How can data analytics detect anomalies and increase robustness in the network backbone of emerging cloud datacenter networks? In this paper, we discuss these outstanding problems and describe far-reaching solutions applicable across the interconnected ecosystem of IoT and mobile devices, manycore chips, datacenters, and networks.","PeriodicalId":350509,"journal":{"name":"Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion","volume":"938 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123064263","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}