K. Vadivel, B. Bruin, Roel Jordans, H. Corporaal, P. Jääskeläinen
{"title":"Prebypass: Software Register File Bypassing for Reduced Interconnection Architectures","authors":"K. Vadivel, B. Bruin, Roel Jordans, H. Corporaal, P. Jääskeläinen","doi":"10.1109/DSD57027.2022.00030","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00030","url":null,"abstract":"Exposed Datapath Architectures (EDPAs) with aggressively pruned data-path connectivity, where not all function units in the design have connections to a centralized register file, are promising solutions for energy-efficient computation. A direct bypassing of data between function units without temporary copies to the register file is a prime optimization for programming such architectures. However, traditional compiler frameworks, such as LLVM, assume function-units connect to register-files and allocate all live variables in register-files. This leads to schedule inefficiencies in terms of instruction-level parallelism and reg-ister accesses in the EDPAs. To address these inefficiencies, we propose Prebypass; a new optimization pass for EDPA compiler backends. Experimental results on an EDPA class of architecture, Transport- Triggered Architecture, show that Prebypass improves the runtime, register reads, and register writes up to 16%, 26 %, and 37 % respectively, when the datapath is extremely pruned. Evaluation in a 28-nm FDSOI technology reveals that Prebypass improves the core-level Energy by 17.5 % over the current heuristic scheduler.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"38 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"122201372","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Adaptive Exploration Based Routing for Spatial Isolation in Mixed Criticality Systems","authors":"Nidhi Anantharajaiah, J. Becker","doi":"10.1109/DSD57027.2022.00032","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00032","url":null,"abstract":"Applications of different criticality are increasingly sharing the same System-on-Chip platform to be cost and resource effective. On such mixed criticality systems, spatial partitioning of resources is a commonly utilized technique to prevent interference between applications. At the communication level, Network-on-Chip (NoC) used in such systems can aid by isolating network traffic within application regions. Topologies that can develop in such partitions can be regular or irregular requiring minimal and non-minimal routing. For the NoC to be flexible and support such varying network parameters, it is desirable that the routing algorithm can support communication for all possible topologies. Here, we investigate a topology agnostic routing algorithm based on Ant Colony Optimization (ACO) metaheuristic. The routing algorithm explores the NoC for feasible paths using special ant packets and discovers paths based on history of already utilized paths and local traffic information. We aim to decrease the exploration time overhead, by proposing an adaptive exploration technique. Compared to the static version, the proposed technique can decrease the exploration time overhead by upto 68% while maintaining comparable latency and throughput.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130658703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Juan Nicolás Mendoza-Chavarría, Eric R. Zavala-Sánchez, Liliana Granados-Castro, I. A. Cruz-Guerrero, H. Fabelo, S. Ortega, Gustavo Marrero Callico, D. U. Campos‐Delgado
{"title":"Glioblastoma Classification in Hyperspectral Images by Nonlinear Unmixing","authors":"Juan Nicolás Mendoza-Chavarría, Eric R. Zavala-Sánchez, Liliana Granados-Castro, I. A. Cruz-Guerrero, H. Fabelo, S. Ortega, Gustavo Marrero Callico, D. U. Campos‐Delgado","doi":"10.1109/DSD57027.2022.00118","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00118","url":null,"abstract":"Glioblastoma is considered an aggressive tumor due to its rapid growth rate and diffuse pattern in various parts of the brain. Current in-vivo classification procedures are executed under the supervision of an expert. However, this methodology could be subjective and time-consuming. In this work, we propose a classification method for in-vivo hyperspectral brain images to identify areas affected by glioblastomas based on nonlinear spectral unmixing. This methodology follows a semi-supervised approach for the estimation of the end-members in a multi-linear model. To improve the classification results, we vary the number of end-members per-class to address spectral variability of each studied type of tissue. Once the set of end-members is obtained, the classification map is generated according to the end-member with the highest abundance in each pixel, followed by morphological operations to smooth the resulting maps. The classification results demonstrate that the proposed methodology generates high performance in the regions of interest, with an accuracy above 0.75 and 0.96 in the inter and intra-patient strategies, respectively. These results indicate that the proposed methodology has the potential to be used as an assistant tool in the diagnosis of glioblastoma in hyperspectral imaging.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"35 7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116648087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
B. Ghavami, Mani Sadati, M. Shahidzadeh, Zhenman Fang, Lesley Shannon
{"title":"Blind Data Adversarial Bit-flip Attack against Deep Neural Networks","authors":"B. Ghavami, Mani Sadati, M. Shahidzadeh, Zhenman Fang, Lesley Shannon","doi":"10.1109/DSD57027.2022.00126","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00126","url":null,"abstract":"Because of their high accuracy, deep neural net-works (DNNs) have achieved amazing success in security-critical systems such as medical devices. It has recently been demon-strated that Adversarial Bit Flip Attacks (BFAs) against DNN hardware by flipping a very small number of bits can result in catastrophic accuracy loss. The reliance on test data, however, is a significant drawback of previous state-of-the-art bit-flip attack methods. This is frequently not possible with applications containing sensitive or proprietary data. In this paper, we propose Blind Data Adversarial Bit-flip Attack (BDFA), a novel technique to enable BFA against DNN hardware without any access to the training or testing data. This is achieved by optimizing for a synthetic dataset, which is engineered to match the statistics of batch normalization across different layers of the network and the targeted label. Experimental results show that BDFA could decrease the accuracy of ResNet50 significantly from 75.96% to 13.94% with only 4 bits flips.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128410067","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Demystifying the TensorFlow Eager Execution of Deep Learning Inference on a CPU-GPU Tandem","authors":"Paul Delestrac, L. Torres, D. Novo","doi":"10.1109/DSD57027.2022.00066","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00066","url":null,"abstract":"Machine Learning (ML) frameworks are tools that facilitate the development and deployment of ML models. These tools are major catalysts of the recent explosion in ML models and hardware accelerators thanks to their high programming abstraction. However, such an abstraction also obfuscates the run-time execution of the model and complicates the understanding and identification of performance bottlenecks. In this paper, we demystify how a modern ML framework manages code execution from a high-level programming language. We focus our work on the TensorFlow eager execution, which remains obscure to many users despite being the simplest mode of execution in TensorFlow. We describe in detail the process followed by the runtime to run code on a CPU-GPU tandem. We propose new metrics to analyze the framework's runtime performance overhead. We use our metrics to conduct in-depth analysis of the inference process of two Convolutional Neural Networks (CNNs) (LeNet-5 and ResNet-50) and a transformer (BERT) for different batch sizes. Our results show that GPU kernels execution need to be long enough to exploit thread parallelism, and effectively hide the runtime overhead of the ML framework.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"118 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131802598","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Post-Quantum Enhanced TLS 1.3 on Embedded Devices","authors":"Dominik Marchsreiter, Martha Johanna Sepúlveda","doi":"10.1109/DSD57027.2022.00127","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00127","url":null,"abstract":"Most of todays Internet connections are protected through the Transport Layer Security (TLS) protocol. Its client-server handshake mechanism provides authentication, privacy and data integrity between communicating applications. It is also the security base for the 5G connectivity. While currently considered secure, the dawn of quantum computing represents a threat for TLS. In order to prepare for such an event, TLS must integrate quantum-secure (post-quantum) cryptography (PQC). The use of hybrid approaches, that combines PQC and traditional cryptography are recommended by security agencies. Efficient PQC integration at TLS requires the exploration of a wide set of design parameters and platforms. To this end this work presents the following contributions. First, wide evaluation of PQC-enhanced TLS hybrid protocols, using end-to-end communication latency as metric. Second, the exploration and benchmarking in constrained embedded devices. Third, a wide traffic analysis, including the impact and behavior of PQC-enhanced hybrid TLS in real practical scenarios.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131963105","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Evaluation of Early-exit Strategies in Low-cost FPGA-based Binarized Neural Networks","authors":"Minxuan Kong, Kris Nikov, J. Núñez-Yáñez","doi":"10.1109/DSD57027.2022.00035","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00035","url":null,"abstract":"In this paper, we investigate the application of early-exit strategies to quantized neural networks with binarized weights, mapped to low-cost FPGA SoC devices. The increasing complexity of network models means that hardware reuse and heterogeneous execution are needed and this opens the opportunity to evaluate the prediction confidence level early on. We apply the early-exit strategy to a network model suitable for ImageNet classification that combines weights with floating-point and binary arithmetic precision. The experiments show an improvement in inferred speed of around 20% using an early-exit network, compared with using a single primary neural network, with a negligible accuracy drop of 1.56%.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"117 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124326462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexandre Bordat, Petr Dobiáš, J. Kernec, David Guyard, Olivier Romain
{"title":"GPU Based Implementation for the Pre-Processing of Radar-Based Human Activity Recognition","authors":"Alexandre Bordat, Petr Dobiáš, J. Kernec, David Guyard, Olivier Romain","doi":"10.1109/DSD57027.2022.00085","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00085","url":null,"abstract":"The correlation between an ageing population glob- ally and the increased risk of falling is a real challenge for health care infrastructures. This calls for the development of new ways to monitor the elderly at home. The confidentiality of radar data coupled with its richness of information can address weaknesses of existing technologies, namely, privacy and acceptance. The radar data produce a large quantity of data that needs to be processed in real-time to ensure a timely detection of fall/critical events necessary for the well-being of the elderly. We introduce a new embedded architecture using a G PU allowing a gain in processing time compared to CPU alone. We used an off- the-shelf frequency-modulated continuous-wave (FMCW) radar (Ancortek model SDR 980AD2). It is followed by a pre-processing chain consisting of a Fast Fourier Transform, Filter and Short Time Fourier Transform (STFT) to obtain time-velocity maps or spectrograms to extract characteristics of human activities such as walking. An implementation with cuFFT on Jetson Xavier increases the performance margin for the downstream of the processing chain, the acceleration factor being 10.49 compared to state-of-the-art CPU architecture. Continuous monitoring of the subject will save lives, minimize injuries, reduce anxiety and prevent post-fall syndrome (PDS).","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"253 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121034040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
K. Datta, S. Shirinzadeh, P. L. Thangkhiew, I. Sengupta, R. Drechsler
{"title":"Unlocking Sneak Path Analysis in Memristor Based Logic Design Styles","authors":"K. Datta, S. Shirinzadeh, P. L. Thangkhiew, I. Sengupta, R. Drechsler","doi":"10.1109/DSD57027.2022.00111","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00111","url":null,"abstract":"Memristors or Resistive Random Access Memory (RRAM) are emerging non-volatile memory devices that can be used for both storage and computing. In this type of memory the information is stored in memory cells in the form of resistance. One of the very important challenges in memristive crossbars is the existence of Sneak Paths, which result in erroneous reading of memory cells. Most of the logic in-memory techniques have emphasized on improving the logic design perspective, but have given minor importance to the sneak path issue. In this paper we show the effect of sneak paths on crossbars of various sizes, and then try to analyze the logic design approaches like MAGIC and MAJORITY with respect to their immunity to sneak paths. Experimental result shows that with some extra overhead we can eliminate the sneak path effect in various logic design methods.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129210800","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Víctor Soria Pardos, Max Doblas, Guillem López-Paradís, Gerard Candón, Narcís Rodas, Xavier Carril, Pau Fontova-Musté, Neiel Leyva, Santiago Marco-Sola, Miquel Moretó
{"title":"Sargantana: A 1 GHz+ In-Order RISC-V Processor with SIMD Vector Extensions in 22nm FD-SOI","authors":"Víctor Soria Pardos, Max Doblas, Guillem López-Paradís, Gerard Candón, Narcís Rodas, Xavier Carril, Pau Fontova-Musté, Neiel Leyva, Santiago Marco-Sola, Miquel Moretó","doi":"10.1109/DSD57027.2022.00042","DOIUrl":"https://doi.org/10.1109/DSD57027.2022.00042","url":null,"abstract":"The RISC-V open Instruction Set Architecture (ISA) has proven to be a solid alternative to licensed ISAs. In the past 5 years, a plethora of industrial and academic cores and accelerators have been developed implementing this open ISA. In this paper, we present Sargantana, a 64-bit processor based on RISC-V that implements the RV64G ISA, a subset of the vector instructions extension (RVV 0.7.1), and custom application-specific instructions. Sargantana features a highly optimized 7-stage pipeline implementing out-of-order write-back, register renaming, and a non-blocking memory pipeline. Moreover, Sar-gantana features a Single Instruction Multiple Data (SIMD) unit that accelerates domain-specific applications. Sargantana achieves a 1.26 GHz frequency in the typical corner, and up to 1.69 GHz in the fast corner using 22nm FD-SOI commercial technology. As a result, Sargantana delivers a 1.77× higher Instructions Per Cycle (IPC) than our previous 5-stage in-order DVINO core, reaching 2.44 CoreMark/MHz. Our core design delivers comparable or even higher performance than other state-of-the-art academic cores performance under Autobench EEMBC benchmark suite. This way, Sargantana lays the foundations for future RISC-V based core designs able to meet industrial-class performance requirements for scientific, real-time, and high-performance computing applications.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129113318","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}