{"title":"Work-in-Progress: A concept of a hardware design environment with the functional language Elixir","authors":"Hideki Takase, K. Matsui, Yoshihiro Ueno, Masakazu Mori, Yuki Hisae, Susumu Yamazaki","doi":"10.1145/3349567.3351715","DOIUrl":"https://doi.org/10.1145/3349567.3351715","url":null,"abstract":"The functional language Elixir is designed to be effective for the application. One of the most considerable feature of Elixir is that it is easy to realize the parallel processing with the standard library, such as Flow. In this paper, we study a design environment for hardware circuits using Elixir as a design language. We propose a synthesize flow for data flow hardware on the FPGA from the native Elixir code. Our method synthesizes functional equivalence circuits from the description of Enum and Flow in Elixir, which are libraries for direct manipulation and parallel processing of data collection in Elixir. Data flow is implemented base on the pipeline operator $vert >$ which connects the processing relation of the function by the data processing order. To realize a hardware design environment by Elixir, this paper shares current status of our implementation.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125530555","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: An Improved Network Interface Card with Query Filter for Big Data Systems","authors":"Jinyu Zhan, Ying Li, Wei Jiang, Junting Wu, Jianping Zhu","doi":"10.1145/3349567.3351720","DOIUrl":"https://doi.org/10.1145/3349567.3351720","url":null,"abstract":"In this paper we approach to accelerate the data processing of storage and computing separated big data systems. We propose an improved Network Interface Card with Query Filter (NIC-QF), implemented by FPGA on storage nodes, to accelerate the data queries, which can also reduce the workload and communication overhead on computing nodes. NIC-QF is designed with query filtering accelerator and Network Interface Card (NIC) communicator, which can filter the original data on storage nodes as an implicit coprocessor and directly send the filtered data to computing nodes of big data systems. Filter units in NIC-QF can perform multiple SQL tasks in parallel, and each filter unit is internally pipelined, which can further speed up the data processing. Experiments with two benchmarks demonstrate the efficiency of our approach, which can achieve average up to 65.56% faster than the traditional approach.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127834538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Supreeth Mysore Shivanandamurthy, Ishan G. Thakkar, S. A. Salehi
{"title":"Work-in-Progress: A Scalable Stochastic Number Generator for Phase Change Memory Based In-Memory Stochastic Processing","authors":"Supreeth Mysore Shivanandamurthy, Ishan G. Thakkar, S. A. Salehi","doi":"10.1145/3349567.3351717","DOIUrl":"https://doi.org/10.1145/3349567.3351717","url":null,"abstract":"Stochastic computing based Processing-In-Memory (PIM) architectures (e.g., [1]) can provide massive parallelism with higher energy-efficiency, for implementing complex computations in main memory. However, stochastic computing arithmetic requires random bit streams generated by stochastic number generators (SNGs), which account for significant area and energy consumption. Moreover, SNGs' numerical precision needs improvement to reduce errors in stochastic computations [1]. Thus, low numerical precision and high implementation overheads of SNGs can offset the benefits of adopting stochastic computing in PIM architectures. In this paper, we exploit the inherent stochasticity of Phase Change Memory (PCM) cells to design a scalable and area-energy efficient SNG for PCM-based stochastic PIM architectures. Our designed SNG can achieve up to ~300× lower area and up to ~250× lower energy consumption with better numerical precision, compared to the Linear Feedback Shift Register (LFSR) based conventional SNG from [2].","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"222 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122048099","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: DeVos: A Learning-based Delay Model of Voltage-Scaled Circuits","authors":"Dongning Ma, Siyu Shen, Xun Jiao","doi":"10.1145/3349567.3351725","DOIUrl":"https://doi.org/10.1145/3349567.3351725","url":null,"abstract":"Dynamic voltage and frequency scaling (DVFS) is a typical method to reduce energy consumption of circuits. However, it may cause timing errors if the frequency is not set properly under scaled voltages. To alleviate this issue, this paper proposes DeVos, a supervised learning model that can predict dynamic delay of voltage-scaled circuits based on their input workload. We measure the dynamic delay using switching activity generated through gate-level simulation of a post place-and-route design in TSMC 45nm process. We then look for features in the input data that influence dynamic path sensitization. Using these features we apply random forest to construct a predictive model trained and tested using random data. Across a wide range of voltage levels, DeVos achieves on average a mean absolute percentage error (MAPE) of less than 5%. To our best knowledge, DeVos is the first dynamic delay model of voltage-scaled circuits and can be used to enable accurate dynamic voltage and frequency scaling.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121859255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini
{"title":"Work-in-Progress: DORY: Lightweight Memory Hierarchy Management for Deep NN Inference on IoT Endnodes","authors":"A. Burrello, Francesco Conti, Angelo Garofalo, D. Rossi, L. Benini","doi":"10.1145/3349567.3351726","DOIUrl":"https://doi.org/10.1145/3349567.3351726","url":null,"abstract":"IoT endnodes often couple a small and fast L1 scratchpad memory with higher-capacity but lower bandwidth and speed L2 background memory. The absence of a coherent hardware cache hierarchy saves energy but comes at the cost of labor-intensive explicit memory management, complicating the deployment of algorithms with large data memory footprint, such as Deep Neural Network (DNN) inference. In this work, we present DORY, a lightweight software-cache dedicated to DNN Deployment Oriented to memoRY. DORY leverages static data tiling and DMA-based double buffering to hide the complexity of manual L1-L2 memory traffic management. DORY enables storage of activations and weights in L2 with less than 4% performance overhead with respect to direct execution in L1. We show that a 142 kB DNN achieving 79.9% on CIFAR-10 runs 3.2x faster compared to its execution directly from L2 memory while consuming 1.9x less energy.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125501379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: A Case for Design Space Exploration of Context-aware Adaptive Embedded Systems","authors":"Rajesh Kedia, M. Balakrishnan, K. Paul","doi":"10.1145/3349567.3351714","DOIUrl":"https://doi.org/10.1145/3349567.3351714","url":null,"abstract":"In this paper, we introduce Context-aware Adaptive Embedded Systems (CAES) and illustrate the complexity of early design space exploration (DSE) process using examples of two real systems that are being implemented in our group. Next, we give an overview of the proposed approach and present some preliminary results.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"288 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123291413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: Drama: A High Efficient Neural Network Accelerator on FPGA using Dynamic Reconfiguration","authors":"Yang Yang, Chao Wang, Xuehai Zhou","doi":"10.1145/3349567.3351727","DOIUrl":"https://doi.org/10.1145/3349567.3351727","url":null,"abstract":"In this paper, we propose a high efficient neural network accelerator on FPGA by using dynamic reconfiguration, named Drama. Firstly, we design a high-efficient hardware architecture and provide a hardware template that can generate optimal configuration for each layer. Then, to explore the key features of the neural network models, we employ a layer-clustering algorithm to classify different layers. After that, we transform CNN models into task sequences. To accomplish the execution of the sequence, the FPGA-based hardware is able to switch the accelerator with dynamic reconfiguration and offload the related tasks to the accelerator at runtime. Preliminary results on the FPGA platform demonstrate that Drama is able to improve the performance significantly due to the dynamic reconfiguration techniques.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115557598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Erman Nghonda Tchinda, Danielle Tchuinkou Kwadjo, C. Bobda
{"title":"Work-in-Progress: A Distributed Smart Camera Apparatus to Enable Scene Immersion","authors":"Erman Nghonda Tchinda, Danielle Tchuinkou Kwadjo, C. Bobda","doi":"10.1145/3349567.3351716","DOIUrl":"https://doi.org/10.1145/3349567.3351716","url":null,"abstract":"In this paper, we investigate the potential of an immersion technology system implemented on embedded devices. Our system consists of distributed smart cameras with overlapping views, covering numerous viewpoints of a monitored scene so that each smart camera knows the position of its neighbors. The system provides an on-demand panoramic field-of-view (FOV) in real-time. To generate the panoramic view, we design and implement an image stitching system using images captured from a subset of adjacent embedded cameras. We verify the effectiveness of our method in terms of quality of result (QoR) and computation efficiency. Initial results show up 12 FPS with 8MP cameras.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"117067904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Work-in-Progress: PreGC: Pre-migrating Valid Pages to Relieve Performance Cliff of 3D Solid-State Drives","authors":"Yajuan Du, Wei Liu, Rui Wang, Yao Zhou, J. Xue","doi":"10.1145/3349567.3351731","DOIUrl":"https://doi.org/10.1145/3349567.3351731","url":null,"abstract":"In order to satisfy the increased concern on SSD performance, this paper studied GC performance in the view of performance cliff and tail latency. At first, our preliminary experiments figure out that increased page migrations is the root cause of performance cliff. Then, combined with the existing works aiming at GC performance in 2D SSDs, including the partial GC and aggressive GC, a novel GC-assisting method PreGC is proposed to relieve GC-induced performance cliff. The key idea of PreGC is that a part of pages within victim blocks can be pre-migrated ahead of the time nearby GC invoking and when system is idle. Thus, normal page migrations induced by GC can be reduced and response time peaks would be lowered down. Experimental results show that PreGC can efficiently relieve the performance cliff by reducing migrated page numbers in normal GC as well as the tail latency while inducing negligible write amplification.","PeriodicalId":194982,"journal":{"name":"2019 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125452261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}