{"title":"A Fully Onchip Binarized Convolutional Neural Network FPGA Impelmentation with Accurate Inference","authors":"Li Yang, Zhezhi He, Deliang Fan","doi":"10.1145/3218603.3218615","DOIUrl":"https://doi.org/10.1145/3218603.3218615","url":null,"abstract":"Deep convolutional neural network has taken an important role in machine learning algorithm which has been widely used in computer vision tasks. However, its enormous model size and massive computation cost have became the main obstacle for deployment of such powerful algorithm in low power and resource limited embedded system, such as FPGA. Recent works have shown the binarized neural networks (BNN), utilizing binarized (i.e. +1 and -1) convolution kernel and binary activation function, can significantly reduce the model size and computation complexity, which paves a new road for energy-efficient FPGA implementation. In this work, we first propose a new BNN algorithm, called Parallel-Convolution BNN (i.e. PC-BNN), which replaces the original binary convolution layer in conventional BNN with two parallel binary convolution layers. PC-BNN achieves ~86% on CIFAR-10 dataset with only 2.3Mb parameter size. We then deploy our proposed PC-BNN into the Xilinx PYNQ Z1 FPGA board with only 4.9Mb on-chip RAM. Since the ultra-small network parameter, it is feasible to store the whole network parameter into on-chip RAM, which could greatly reduce the energy and delay overhead to load network parameter from off-chip memory. Meanwhile, a new data streaming pipeline architecture is proposed in PC-BNN FPGA implementation to further improve throughput. The experiment results show that our PC-BNN based FPGA implementation achieves 930 frames per second, 387.5 FPS/Watt and 396x10-4 FPS/LUT, which are among the best throughput and energy efficiency compared to most recent works.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-07-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90679359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"In-situ Stochastic Training of MTJ Crossbar based Neural Networks","authors":"Ankit Mondal, Ankur Srivastava","doi":"10.1145/3218603.3218616","DOIUrl":"https://doi.org/10.1145/3218603.3218616","url":null,"abstract":"Owing to high device density, scalability and non-volatility, Magnetic Tunnel Junction-based crossbars have garnered significant interest for implementing the weights of an artificial neural network. The existence of only two stable states in MTJs implies a high overhead of obtaining optimal binary weights in software. We illustrate that the inherent parallelism in the crossbar structure makes it highly appropriate for in-situ training, wherein the network is taught directly on the hardware. It leads to significantly smaller training overhead as the training time is independent of the size of the network, while also circumventing the effects of alternate current paths in the crossbar and accounting for manufacturing variations in the device. We show how the stochastic switching characteristics of MTJs can be leveraged to perform probabilistic weight updates using the gradient descent algorithm. We describe how the update operations can be performed on crossbars both with and without access transistors and perform simulations on them to demonstrate the effectiveness of our techniques. The results reveal that stochastically trained MTJ-crossbar NNs achieve a classification accuracy nearly same as that of real-valued-weight networks trained in software and exhibit immunity to device variations.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72772373","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deploying Customized Data Representation and Approximate Computing in Machine Learning Applications","authors":"M. Nazemi, Massoud Pedram","doi":"10.1145/3218603.3218612","DOIUrl":"https://doi.org/10.1145/3218603.3218612","url":null,"abstract":"Major advancements in building general-purpose and customized hardware have been one of the key enablers of versatility and pervasiveness of machine learning models such as deep neural networks. To sustain this ubiquitous deployment of machine learning models and cope with their computational and storage complexity, several solutions such as low-precision representation of model parameters using fixed-point representation and deploying approximate arithmetic operations have been employed. Studying the potency of such solutions in different applications requires integrating them into existing machine learning frameworks for high-level simulations as well as implementing them in hardware to analyze their effects on power/energy dissipation, throughput, and chip area. Lop is a library for design space exploration that bridges the gap between machine learning and efficient hardware realization. It comprises a Python module, which can be integrated with some of the existing machine learning frameworks and implements various customizable data representations including fixed-point and floating-point as well as approximate arithmetic operations. Furthermore, it includes a highly-parameterized Scala module, which allows synthesizing hardware based on the said data representations and arithmetic operations. Lop allows researchers and designers to quickly compare quality of their models using various data representations and arithmetic operations in Python and contrast the hardware cost of viable representations by synthesizing them on their target platforms (e.g., FPGA or ASIC). To the best of our knowledge, Lop is the first library that allows both software simulation and hardware realization using customized data representations and approximate computing techniques.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89960323","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AxTrain: Hardware-Oriented Neural Network Training for Approximate Inference","authors":"Xin He, Liu Ke, Wenyan Lu, Guihai Yan, Xuan Zhang","doi":"10.1145/3218603.3218643","DOIUrl":"https://doi.org/10.1145/3218603.3218643","url":null,"abstract":"The intrinsic error tolerance of neural network (NN) makes approximate computing a promising technique to improve the energy efficiency of NN inference. Conventional approximate computing focuses on balancing the efficiency-accuracy trade-off for existing pre-trained networks, which can lead to suboptimal solutions. In this paper, we propose AxTrain, a hardware-oriented training framework to facilitate approximate computing for NN inference. Specifically, AxTrain leverages the synergy between two orthogonal methods---one actively searches for a network parameters distribution with high error tolerance, and the other passively learns resilient weights by numerically incorporating the noise distributions of the approximate hardware in the forward pass during the training phase. Experimental results from various datasets with near-threshold computing and approximation multiplication strategies demonstrate AxTrain's ability to obtain resilient neural network parameters and system energy efficiency improvement.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2018-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80241455","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keynote: Peering into the post Moore's Law world","authors":"T. Austin","doi":"10.1109/ISLPED.2017.8009150","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009150","url":null,"abstract":"For decades, Moore's Law dimensional scaling has been the fuel that propelled the computing industry forward, by delivering performance, power and cost advantages with each new generation of silicon. Today, these scaling benefits are slowing to a crawl. If the computing industry wants to continue to make scalability the primary source of value in tomorrow's computing systems, we will have to quickly find new and productive ways to scale future systems. In this talk, I will highlight my work and the work of others that is rejuvenating scaling through the application of heterogeneous parallel designs. Leveraging these technologies to solve the scaling problem will be a significant challenge, as future scalability success will ultimately be less about “how” to do it and more about “how much” will it cost.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91332962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keynote: Architecture and software for emerging low-power systems","authors":"Wen-mei W. Hwu","doi":"10.1109/ISLPED.2017.8009151","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009151","url":null,"abstract":"We have been experiencing two very important developments in computing. On the one hand, a tremendous amount of resources have been invested into innovative applications such as first-principle based models, deep learning and cognitive computing. On the other hand, the industry has been taking a technological path where application performance and power efficiency vary by more than two orders of magnitude depending on their parallelism, heterogeneity, and locality. We envision a “perfect storm” is coming for future computing resulting from the fact that data movement has become the dominating factor for both power and performance of high-valued applications. It will be critical to match the compute throughput to the data access bandwidth and to locate the compute at where the data is. Much has been and continuously needs to be learned about of algorithms, languages, compilers and hardware architecture in this movement. What are the killer applications that may become the new diver for future technology development? How hard is it to program existing systems to address the date movement issues today? How will we program these systems in the future? How will innovations in memory devices present further opportunities and challenges in designing new systems? What is the impact on long-term software engineering cost on applications (and legacy applications in particular)? In this talk, I will present some lessons learned as we design the IBM-Illinois C3SR Erudite system inside this perfect storm.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"77640367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Keynote: A new Silicon Age 4.0: Generating semiconductor-intelligence paradigm with a Virtual Moore's Law Economics and Heterogeneous technologies","authors":"Nicky Liu","doi":"10.1109/ISLPED.2017.8009149","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009149","url":null,"abstract":"The future of the silicon-based economy will not be as pessimistic as some commentators have argued, given their predictions of the end of Moore's Law Economy (ME) by the early 2020s. On the contrary, a Virtual Moore's Law Economy (VME) will develop and thrive, advancing innovation by a new Silicon Way of producing various application-driven Heterogeneous Integrated (HI) Nano-systems by optimization of physics, materials, devices, circuits/chips, software and systems to enable exciting applications for business growth. The semiconductor industry will enjoy sufficient financial returns from new application and system-product sales, even considering more expensive silicon investment. Such a technological approach based on a (Function × Value)-Scaling Down-Plus-Up Methodology, in addition to Linear-Scaling, Area-Scaling and Volumetric-Scaling Methodologies, can fundamentally change the way of thinking and execution toward optimizing coherently both technology definition and final system design with an holistic HIDAS (HI Design/Architecture/System) method. This will drive IC scaling to an effective 1-Nanometer Realm, stimulating a thriving silicon industry which can have at least 30 more years of growth toward a 1 trillion-dollar size.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82559147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Message from the program co-chairs","authors":"J. Kulkarni, T. Wenisch","doi":"10.1109/ISLPED.2017.8009141","DOIUrl":"https://doi.org/10.1109/ISLPED.2017.8009141","url":null,"abstract":"Over the past few years, however, issues related to the broader context for Dublin Core have come to the fore, such as XML and, since 1997, the Resource Description Framework, now part of a Semantic Web Activity; harvesting approaches such as Open Archives Initiative; domainspecific metadata standards and their relation to a “core”; and the constructs and policies necessary for managing namespaces and “application profiles” for automatic processing by machines.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79622275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Let's get physical: Adding physical dimensions to cyber systems","authors":"A. S. Vincentelli","doi":"10.1109/ISLPED.2015.7273478","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273478","url":null,"abstract":"Technology advances are creating major shifts in the industrial landscape. Traditional sectors such as transportation, medical and avionics, are witnessing fundamental changes in the supply chain and in the content where the interactions between the physical world and the computing world are becoming increasingly tight. Cyber Physical Systems, Systems of Systems, Internet of Things, Industrie 4.0, Swarm Systems and The Fog are all sectors that attract massive attention from the research communities and massive investment from industry. These concepts are tightly intertwined and describe a movement towards a fully interconnected planet where billions of devices interact via a complex mesh of wireless and wired communication infrastructures. The most compelling vision for the future of technology and industry is one where a swarm of devices is connected with the cloud to provide platforms for myriad of new applications. In this new world, new companies will arise and established ones will have to change radically their business model. The increasing sophistication and heterogeneity of these systems requires radical changes in the way sense-and-control platforms are designed to regulate them. In this presentation, I highlight some of the design challenges due to the complexity, heterogeneity and power consumption of CPS. Indeed, low power consumption is an essential requirement for the swarm of devices especially in the domain of wearable devices for healthcare. Coupled with low cost and reliability, power consumption has to be taken into consideration for any CPS deployment.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82244860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Opportunities in system power management for high performance mixed signal platforms","authors":"Jose Pineda de Jyvez","doi":"10.1109/ISLPED.2015.7273479","DOIUrl":"https://doi.org/10.1109/ISLPED.2015.7273479","url":null,"abstract":"In an era in which foundry technologies are a commodity, product differentiation comes by design. As fabless becomes mainstream, a paradigm shift demands innovation and collaboration among industrial and research thinking to further lower the cost of ICs, as well as to address upcoming power-performance challenges. High performance mixed signal (HPMS) platforms require stringent overall system and subsystem performance. The ability to design ultra-low power systems is used in a wide range of platforms including consumer, mobile, identification, healthcare products and microcontrollers. This presentation explores low power design techniques, challenges and opportunities faced in an industrial research environment. The overview addresses design tradeoffs, design implications, and measurement results during their deployment in HPMS platforms.","PeriodicalId":20456,"journal":{"name":"Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07)","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2015-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81466671","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}