{"title":"Single Flux Quantum Circuit Technology and CAD overview","authors":"C. Fourie","doi":"10.1145/3240765.3243498","DOIUrl":"https://doi.org/10.1145/3240765.3243498","url":null,"abstract":"Single Flux Quantum (SFQ) electronic circuits originated with the advent of Rapid Single Flux Quantum (RSFQ) logic in 1985 and have since evolved to include more energy-efficient technologies such as ERSFQ and eSFQ. SFQ logic circuits, based on the manipulation of quantized flux pulses, have been demonstrated to run at clock speeds in excess of 120 GHz, and with bit-switch energy below 1 aJ. Small SFQ microprocessors have been developed, but characteristics inherent to SFQ circuits and the lack of circuit design tools have hampered the development of large SFQ systems. SFQ circuit characteristics include fan-out of one and the subsequent demand for pulse splitters, gate-level clocking, susceptibility to magnetic fields and sensitivity to intra-gate and inter-gate inductance. Superconducting interconnects propagate data pulses at the speed of light, but suffer from reflections at vias that attenuate transmitted pulses. The recently started IARPA SuperTools program aims to deliver SFQ Computer-Aided Design (CAD) tools that can enable the successful design of 64 bit RISC processors given the characteristics of SFQ circuits. A discussion on the technology of SFQ circuits and the most modern SFQ fabrication processes is presented, with a focus on the unique electronic design automation CAD requirements for the design, layout and verification of SFQ circuits.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126978133","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kent W. Nixon, Jiachen Mao, Juncheng Shen, Huanrui Yang, H. Li, Yiran Chen
{"title":"SPN Dash - Fast Detection of Adversarial Attacks on Mobile via Sensor Pattern Noise Fingerprinting","authors":"Kent W. Nixon, Jiachen Mao, Juncheng Shen, Huanrui Yang, H. Li, Yiran Chen","doi":"10.1145/3240765.3240851","DOIUrl":"https://doi.org/10.1145/3240765.3240851","url":null,"abstract":"A concerning weakness of deep neural networks is their susceptibility to adversarial attacks. While methods exist to detect these attacks, they incur significant drawbacks, ignoring external features which could aid in the task of attack detection. In this work, we propose SPN Dash, a method for detection of adversarial attacks based on integrity of sensor pattern noise embedded in submitted images. Through experiment, we show that our SPN Dash method is capable of detecting the addition of adversarial noise with up to 94% accuracy for images of size $256times256$. Analysis shows that SPN Dash is robust to image scaling techniques, as well as a small amount of image compression. This performance is on par with state of the art neural network-based detectors, while incurring an order of magnitude less computational and memory overhead.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133737389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wentai Zhang, Hanxian Huang, Jiaxi Zhang, M. Jiang, Guojie Luo
{"title":"Adaptive-Precision Framework for SGD Using Deep Q-Learning","authors":"Wentai Zhang, Hanxian Huang, Jiaxi Zhang, M. Jiang, Guojie Luo","doi":"10.1145/3240765.3240774","DOIUrl":"https://doi.org/10.1145/3240765.3240774","url":null,"abstract":"Stochastic gradient descent (SGD) is a widely-used algorithm in many applications, especially in the training process of deep learning models. Low-precision implementation for SGD has been studied as a major acceleration approach. However, if not appropriately used, low-precision implementation can deteriorate its convergence because of the rounding error when gradients become small near a local optimum. In this work, to balance throughput and algorithmic accuracy, we apply the Q-learning technique to adjust the precision of SGD automatically by designing an appropriate decision function. The proposed decision function for Q-learning takes the error rate of the objective function, its gradients, and the current precision configuration as the inputs. Q-learning then chooses proper precision adaptively for hardware efficiency and algorithmic accuracy. We use reconfigurable devices such as FPGAs to evaluate the adaptive precision configurations generated by the proposed Q-learning method. We prototype the framework using LeNet-5 model with MNIST and CIFAR10 datasets and implement it on a Xilinx KCU1500 FPGA board. In the experiments, we analyze the throughput of different precision representations and the precision-selection of our framework. The results show that the proposed framework with adapative precision increases the throughput by up to 4.3× compared to the conventional 32-bit floating point setting, and it achieves both the best hardware efficiency and algorithmic accuracy.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130772244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Amarù, Eleonora Testa, Miguel Couceiro, O. Zografos, G. Micheli, Mathias Soeken
{"title":"Majority Logic Synthesis","authors":"L. Amarù, Eleonora Testa, Miguel Couceiro, O. Zografos, G. Micheli, Mathias Soeken","doi":"10.1145/3240765.3267501","DOIUrl":"https://doi.org/10.1145/3240765.3267501","url":null,"abstract":"The majority function $langle xyzrangle$ evaluates to true, if at least two of its Boolean inputs evaluate to true. The majority function has frequently been studied as a central primitive in logic synthesis applications for many decades. Knuth refers to the majority function in the last volume of his seminal The Art of Computer Programming as “probably the most important ternary operation in the entire universe.” Majority logic sythesis has recently regained signficant interest in the design automation community due to nanoemerging technologies which operate based on the majority function. In addition, majority logic synthesis has successfully been employed in CMOS-based applications such as standard cell or FPGA mapping. This tutorial gives a broad introduction into the field of majority logic synthesis. It will review fundamental results and describe recent contributions from theory, practice, and applications.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133059973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning IP Protection","authors":"Rosario Cammarota, Indranil Banerjee, Ofer Rosenberg","doi":"10.1145/3240765.3270589","DOIUrl":"https://doi.org/10.1145/3240765.3270589","url":null,"abstract":"Machine learning, specifically deep learning is becoming a key technology component in application domains such as identity management, finance, automotive, and healthcare, to name a few. Proprietary machine learning models - Machine Learning IP - are developed and deployed at the network edge, end devices and in the cloud, to maximize user experience. With the proliferation of applications embedding Machine Learning IPs, machine learning models and hyper-parameters become attractive to attackers, and require protection. Major players in the semiconductor industry provide mechanisms on device to protect the IP at rest and during execution from being copied, altered, reverse engineered, and abused by attackers. In this work we explore system security architecture mechanisms and their applications to Machine Learning IP protection.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"213 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115646864","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GPU Acceleration of RSA is Vulnerable to Side-channel Timing Attacks","authors":"Chao Luo, Yunsi Fei, D. Kaeli","doi":"10.1145/3240765.3240812","DOIUrl":"https://doi.org/10.1145/3240765.3240812","url":null,"abstract":"The RSA algorithm [21] is a public-key cipher widely used in digital signatures and Internet protocols, including the Security Socket Layer (SSL) and Transport Layer Security (TLS). RSA entails excessive computational complexity compared with symmetric ciphers. For scenarios where an Internet domain is handling a large number of SSL connections and generating digital signatures for a large number of files, the amount of RSA computation becomes a major performance bottleneck. With the advent of general-purpose GPUs, the performance of RSA has been improved significantly by exploiting parallel computing on a GPU [9], [18], [23], [26], leveraging the Single Instruction Multiple Thread (SIMT) model.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116057952","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
V. Possani, Yi-Shan Lu, A. Mishchenko, K. Pingali, R. Ribas, A. Reis
{"title":"Unlocking Fine-Grain Parallelism for AIG Rewriting","authors":"V. Possani, Yi-Shan Lu, A. Mishchenko, K. Pingali, R. Ribas, A. Reis","doi":"10.1145/3240765.3240861","DOIUrl":"https://doi.org/10.1145/3240765.3240861","url":null,"abstract":"Parallel computing is a trend to enhance scalability of electronic design automation (EDA) tools using widely available multicore platforms. In order to benefit from parallelism, well-known EDA algorithms have to be reformulated and optimized for multicore implementation. This paper introduces a set of principles to enable a fine-grain parallel AND-inverter graph (AIG) rewriting. It presents a novel method to discover and rewrite in parallel parts of the AIG, without the need for graph partitioning. Experiments show that, when synthesizing large designs composed of millions of AIG nodes, the parallel rewriting on 40 physical cores is up to 36x and 68x faster than ABC commands rewrite −l and drw, respectively, with comparable quality of results in terms of AIG size and depth.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122351375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HLS-Based Optimization and Design Space Exploration for Applications with Variable Loop Bounds","authors":"Young-kyu Choi, J. Cong","doi":"10.1145/3240765.3240815","DOIUrl":"https://doi.org/10.1145/3240765.3240815","url":null,"abstract":"In order to further increase the productivity of field-programmable gate array (FPGA) programmers, several design space exploration (DSE) frameworks for high-level synthesis (HLS) tools have been recently proposed to automatically determine the FPGA design parameters. However, one of the common limitations found in these tools is that they cannot find a design point with large speedup for applications with variable loop bounds. The reason is that loops with variable loop bounds cannot be efficiently parallelized or pipelined with simple insertion of HLS directives. Also, making highly accurate prediction of cycles and resource consumption on the entire design space becomes a challenging task because of the inaccuracy of the HLS tool cycle prediction and the wide design space. In this paper we present an HLS-based FPGA optimization and DSE framework that produces a high-performance design even in the presence of variable loop bounds. We propose code transformations that increase the utilization of the compute resources for variable loops, including several computation patterns with loop-carried dependency such as floating-point reduction and prefix sum. In order to rapidly perform DSE with high accuracy, we describe a resource and cycle estimation model constructed from the information obtained from the actual HLS synthesis. Experiments on applications with variable loop bounds in Polybench benchmarks with Vivado HLS show that our framework improves the baseline implementation by 75X on average and outperforms current state-of-the-art DSE frameworks.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128548975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic Resource Management for Heterogeneous Many-Cores","authors":"J. Henkel, J. Teich, S. Wildermann, H. Amrouch","doi":"10.1145/3240765.3243471","DOIUrl":"https://doi.org/10.1145/3240765.3243471","url":null,"abstract":"With the advent of many-core systems, use cases of embedded systems have become more dynamic: Plenty of applications are concurrently executed, but may dynamically be exchanged and modified even after deployment. Moreover, resources may temporally or permanently become unavailable because of thermal aspects, dynamic power management, or the occurrence of faults. This poses new challenges for reaching objectives like timeliness for real-time or performance for best-effort program execution and maximizing system utilization. In this work, we first focus on dynamic management schemes for reliability/aging optimization under thermal constraints. The reliability of on-chip systems in the current and upcoming technology nodes is continuously degrading with every new generation because transistor scaling is approaching its fundamental limits. Protecting systems against degradation effects such as circuits' aging comes with considerable losses in efficiency. We demonstrate in this work why sustaining reliability while maximizing the utilization of available resources and hence avoiding efficiency loss is quite challenging – this holds even more when thermal constraints come into play. Then, we discuss techniques for run-time management of multiple applications which sustain real-time properties. Our solution relies on hybrid application mapping denoting the combination of design-time analysis with run-time application mapping. We present a method for Real-time Mapping Reconfiguration (RMR) which enables the Run-Time Manager (RM) to execute realtime applications even in the presence of dynamic thermal- and reliability-aware resource management. This paper is paper of the ICCAD 2018 Special Session on “Managing Heterogeneous Many-cores for High-Performance and Energy-Efficiency”. The other two papers of this Special sessions are [1] and [2].","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131095486","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Customized Locking of IP Blocks on a Multi-Million-Gate SoC","authors":"A. Sengupta, M. Nabeel, M. Ashraf, O. Sinanoglu","doi":"10.1145/3240765.3243467","DOIUrl":"https://doi.org/10.1145/3240765.3243467","url":null,"abstract":"Reliance on off-site untrusted fabrication facilities has given rise to several threats such as intellectual property (IP) piracy, overbuilding and hardware Trojans. Logic locking is a promising defense technique against such malicious activities that is effected at the silicon layer. Over the past decade, several logic locking defenses and attacks have been presented, thereby, enhancing the state-of-the-art. Nevertheless, there has been little research aiming to demonstrate the applicability of logic locking with large-scale multi-million-gate industrial designs consisting of multiple IP blocks with different security requirements. In this work, we take on this challenge to successfully lock a multi-million-gate system-on-chip (SoC) provided by DARPA by taking it all the way to GDSII layout. We analyze how specific features, constraints, and security requirements of an IP block can be leveraged to lock its functionality in the most appropriate way. We show that the blocks of an SoC can be locked in a customized manner at 0.5%, 15.3%, and 1.5% chip-level overhead in power, performance, and area, respectively.","PeriodicalId":413037,"journal":{"name":"2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2018-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128078334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}