{"title":"TernaryNeRF: Quantizing Voxel Grid-based NeRF Models","authors":"Seungyeop Kang, S. Yoo","doi":"10.1109/RSP57251.2022.10039009","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039009","url":null,"abstract":"Photo-realistic neural rendering, represented by neural radiance field (NeRF), is considered to be a key technology for AR/VR applications and has been actively studied in recent years. In order to enable widespread adoptions of AR/VR, it is critical to enable low-cost and high-quality rendering on mobile and server systems. In our work, we investigate the feasibility of low-precision representation on the two state-of-the-art NeRF models, InstantNeRF and TensoRF. Our proposed quantization is based on our observation on the characteristics of trained NeRF models. In order to reduce the model size while limiting the loss of rendering quality due to model compression, we propose quantizing the portion of model which dominates the total model size while being robust to aggressive quantization. In our experiments, we demonstrate our proposed ternary quantization can reduce by $7 times sim 15times$ the model sizes of state-of-the-art NeRF models at a negligible loss of rendering quality, which, we consider, will contribute to the AR/VR adoptions on mobile and server systems.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129624865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing embedded AI-based object detection using multi-view approach","authors":"Z. Ning, Mostafa Rizk, A. Baghdadi, J. Diguet","doi":"10.1109/RSP57251.2022.10039026","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039026","url":null,"abstract":"Object detection based on convolutional neural network (CNN) is widely used in multitude emergent applications. Yet, the deployment of CNNs on embedded devices at the edge with reduced resources and power budget poses a real challenge. In this paper, we address this issue by enhancing the detection performance without impacting the inference speed. We investigate the use of multi-view for the same scene to achieve better detection performance. A novel system of distributed smart cameras is proposed where each camera integrates a CNN for detection. Implementation results show that using light networks on the distributed cameras can lead to better detection performance and a reduction in the overall consumed power.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127765892","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gabriel Rutsch, Maximilian Groebner, A. Sanders, Konrad Maier, W. Ecker
{"title":"A framework that enables systematic analysis of mixed-signal applications on FPGA","authors":"Gabriel Rutsch, Maximilian Groebner, A. Sanders, Konrad Maier, W. Ecker","doi":"10.1109/RSP57251.2022.10039031","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039031","url":null,"abstract":"We present a framework that enables systematic analysis of mixed-signal application on FPGA and show its application during architecture validation of a power controller. The open source synthesizable model generator for mixed-signal blocks (msdsl) is used to create a synthesizable prototype of the analog power control application. A library of instrumentation elements enables control from a host computer, time control, analog event capture, analog stimulus and noise generation, as well as trace, read and write of arbitrary signals. This keeps the effort of building the FPGA application prototype low and provides good debugging and analysis capabilities. The end-result is a unique analysis framework for mixed-signal applications that offers almost real time analog simulation speed - thus considering software as well as analog and digital hardware - no risk of damaging equipment and simulator alike analysis and debugging capabilities at a low overhead through an instrumentation library.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"472 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121223951","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automatically Restructuring HDL Modules for Improved Reusability in Rapid Synthesis","authors":"Jakob Wenzel, C. Hochberger","doi":"10.1109/RSP57251.2022.10039003","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039003","url":null,"abstract":"Implementing nontrivial HDL designs can take a lot of time. Particularly for FPGAs, vendor tools tend to become slower, since the devices grow and thus, also the designs grow. It is therefore desirable to create mechanisms that speed up the implementation. Combining pre-implemented blocks to build the final design can be one such mechanism. It can help to reduce the time required for incremental builds, or it can reduce the time required to build families of designs. Yet, typical HDL code is not structured for this purpose. Many modules do not have the right size to be used as pre-implemented blocks. In this paper, we present a methodology to automatically analyze and modify existing HDL code such that the resulting module structure fits the purpose of pre-implementing the modules. To this end, we try to isolate parameters of the HDL code such that we have to reimplement only a small number of modules after a parameter change. The resulting tool is available as open-source software. We have tested our methodology using multiple different benchmark sets, which in total contain thousands of modules. On average, we can extract around 10% of the parameters into smaller modules.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"64 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122355513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Machine Learning-Based Hard/Soft Logic Trade-offs in VTR","authors":"Ritwik Sinha, S. A. Damghani, K. Kent","doi":"10.1109/RSP57251.2022.10039002","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039002","url":null,"abstract":"Circuit optimization, in any application, is of high importance since it not only improves the efficiency of the intended purpose but also enhances the quality of the final product. It enables the circuit designer to cater to the specific needs of the customer. For circuit optimization to occur, we need to elaborate these circuits on a primary level and perform synthesis operations. Previous research shows that the investigation of improvements to different Hardware Description Language (HDL) elaboration phases, was completely closed source. Verilog To Routing (VTR) is an open-source Electronic Design Automation (EDA) tool. ODIN II is the VTR synthesizer that parses the input Verilog, elaborates its Abstract Syntax Tree (AST), performs the partial mapping according to the architecture file, and performs optimizations such as unused logic removal. To that end, the hard versus soft logic trade-off aims to optimize the performance of the circuit. This project focuses on using machine learning approaches to make synthesis tools intelligent enough to decide this ratio on their own, without the need for human intervention, and based on some predefined criteria. This paper discusses the criteria for having less latency or less critical path delay in the circuit. Also, it aims at providing this level of intelligence at an earlier stage in the VTR pipeline to make better use of this information.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"74 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114726688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Weiyan Zhang, Mehran Goli, Alireza Mahzoon, R. Drechsler
{"title":"ANN-based Performance Estimation of Embedded Software for RISC-V Processors","authors":"Weiyan Zhang, Mehran Goli, Alireza Mahzoon, R. Drechsler","doi":"10.1109/RSP57251.2022.10039004","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039004","url":null,"abstract":"The demand for optimized and efficient embedded software is increasing in many applications such as the Internet of Things (IoT) or other Cyber-Physical Systems (CPS). Hence, early performance analysis of embedded software is essential to perform Design Space Exploration (DSE), ensure efficiency, and meet time-to-market constraints. Designers usually use real hardware, simulators, or static analyzers to obtain the performance. However, these methods suffer from serious drawbacks as real hardware is not available in the early stage of the design process, simulators either do not support any timing accuracy or require large execution time, and static analyzers need details of the hardware microarchitecture. In this paper, we present a novel Artificial Neural Network (ANN)-based approach that allows a fast and accurate performance estimation of embedded software for RISC-V processors in the early design phases. This can significantly reduce the burden on designers to perform DSE. The proposed approach takes advantage of the dynamic analysis technique and analytical models and does not require any microarchitecture-related parameters such as cache misses, cache hits, and memory-level parallelism. We compare our proposed microarchitecture-independent approach with state-of-the-art in terms of speed and accuracy. Our experiments on various benchmarks demonstrate that the proposed approach achieves a speed-up of $4.41times$ compared to a RISC-V Virtual Prototype (VP) at the Electronic System Level (ESL), while the estimation results have only a Mean Absolute Percentage Error (MAPE) of 2%.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126998875","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Case for Second-Level Software Cache Coherency on Many-Core Accelerators","authors":"Arthur Vianès, F. Pétrot, F. Rousseau","doi":"10.1109/RSP57251.2022.10038999","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10038999","url":null,"abstract":"Cache and cache-coherence are major aspects of today's high performance computing. A cache stores data as cache-lines of fixed size, and coherence between caches is guaranteed by the cache-coherence protocol which operates on fixed size coherency-blocks. In such systems cache-lines and coherency-blocks are usually the same size and are relatively small, typically 64 bytes. This size choice is a trade-off selected for general-purpose computing: it minimizes false-sharing while keeping cache-maintenance traffic low. False-sharing is considered an unnecessary cache-coherence traffic and it decreases performances. However, for dedicated accelerator this trade-off may not be appropriate: hardware in charge of cache-coherence is expensive and not well exploited by most accelerator applications as by construction these applications minimize false-sharing. This paper investigates the possibility of an alternative trade-off of cache-coherency and cache-maintenance block size for many-core accelerators, by decoupling coherency-block and cache-lines sizes. Interests, advantages and difficulties are presented and discussed in this paper. Then we also discuss needs of software and hardware modifications in prototypes and the capability of such prototypes to evaluate different coherence-block sizes.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"191 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120976749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dominique Heller, Mostafa Rizk, R. Douguet, A. Baghdadi, J. Diguet
{"title":"Marine Objects Detection Using Deep Learning on Embedded Edge Devices","authors":"Dominique Heller, Mostafa Rizk, R. Douguet, A. Baghdadi, J. Diguet","doi":"10.1109/RSP57251.2022.10039025","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039025","url":null,"abstract":"Artificial Intelligence techniques based on convolution neural networks (CNNs) are now dominant in the field of object detection and classification. The deployment of CNNs on embedded edge devices targeting real-time inference sets a challenge due to the limited computing resources and power budgets. Several optimization techniques such as pruning, quantization and use of light neural networks enable the real-time inference but at the cost of precision degradation. However, using efficient approaches to apply the optimization techniques at training and inference stages enable high inference speed with limited degradation of detection performance. In this paper, we revisit the problem of detecting and classifying maritime objects. We investigate different versions of the You Only Look Once (YOLO), a state-of-the-art deep neural network, for real-time object detection and compare their performance for the specific application of detecting maritime objects. The trained YOLO networks are efficiently optimized targeting three recent edge devices: Nvidia Jetson Xavier AGX, AMD-Xilinx Kria KV260 Vision AI Kit, and Movidius Myriad X VPU. The proposed deployments demonstrate promising results with an inference speed of 90 FPS and a limited degradation of 2.4% in mean average precision.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132446171","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Martim Rosado, S. Mallios, P. Tomás, N. Roma, A. David
{"title":"Early prototyping and testing of CERN LHC CMS high-granularity calorimeter slow-control system","authors":"Martim Rosado, S. Mallios, P. Tomás, N. Roma, A. David","doi":"10.1109/RSP57251.2022.10039014","DOIUrl":"https://doi.org/10.1109/RSP57251.2022.10039014","url":null,"abstract":"The Compact Muon Solenoid (CMS) high-granularity calorimeter (HGCAL) upgrade for CERN's Large Hadron Collider (LHC) high-luminosity phase is a detector with more than 6 million channels that will provide precise sensing and measurement of position, timing, and energy of the particles produced in the collisions of the beams. The HGCAL electronics are a large and complex set of processing systems split into front-end and back-end. The front-end, located in the experimental cavern, consists of $boldsymbol{approx 150}$ thousand radiation tolerant ASICs. The high-density FPGA-based back-end is housed away from the radiation area in a set of Advanced Telecommunications Computing Architecture (ATCA) boards and crates hosting $boldsymbol{approx 100}$ FPGAs. Each ATCA back-end board will comprise one (or two) FPGAs, managing up to $boldsymbol{approx 120}$ optical links, each providing a transmission rate of 10.24 Gb/s between the back-end and the front-end electronics. Each back-end FPGA is responsible for configuring and monitoring up to $boldsymbol{approx 3500}$ front-end ASICs and will be controlled by software running on a back-end MPSoC that provides the entry point for the whole control procedure. This paper presents the design and implementation of the prototyping infrastructure deployed to test and validate the slow-control block of the HGCAL back-end electronics, together with the related interfaces with the controller MPSoC and the front-end transceiver ASICs. The required functionalities have been validated with a ZCU102 Xilinx Ultrascale+ development board, which emulated the back-end elements that are still under development and not yet available for this comprehensive test. This development board was connected to other custom ASIC development boards via optical links, emulating the front-end side of the system, also still under development. Besides providing reliable testing and validation of the operation of the whole infrastructure, the prototyping platform also allowed to attain the required software/hardware portability that ensures easy integration/replacement of all the (still) emulated components with their final implementations.","PeriodicalId":201919,"journal":{"name":"2022 IEEE International Workshop on Rapid System Prototyping (RSP)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122421126","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}