Zahra Paria Najafi-Haghi, F. Klemme, Hanieh Jafarzadeh, H. Amrouch, H. Wunderlich
{"title":"Robust Resistive Open Defect Identification Using Machine Learning with Efficient Feature Selection","authors":"Zahra Paria Najafi-Haghi, F. Klemme, Hanieh Jafarzadeh, H. Amrouch, H. Wunderlich","doi":"10.23919/DATE56975.2023.10136961","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136961","url":null,"abstract":"Resistive open defects in FinFET circuits are reliability threats and should be ruled out before deployment. The performance variations due to these defects are similar to the effect of process variations which are mostly benign. In order not to sacrifice yield for reliability the effect of defects should be distinguished from process variations. It has been shown that machine learning (ML) schemes are able to classify defective circuits with high accuracy based on the maximum frequencies $F_{max}$ obtained under multiple supply voltages $V_{dd} in V_{op}$. The paper at hand presents a method to minimize the number of required measurements. Each supply voltage $V_{dd}$ defines a feature $F_{max}(V_{dd})$. A feature selection technique is presented, which uses also the already available $F_{max}$ measurements. It is shown that ML-based techniques can work efficiently and accurately with this reduced number of $F_{max}(V_{dd})$ measurements.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"173 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124238479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Gia Bao Thieu, Sven Gesper, G. P. Vayá, C. Riggers, Oliver Renke, Till Fiedler, Jakob Marten, Tobias Stuckenberg, Holger Blume, C. Weis, Lukas Steiner, C. Sudarshan, N. Wehn, Lennart M. Reimann, R. Leupers, Michael Beyer, D. Köhler, Alisa Jauch, Jan Micha Borrmann, Setareh Jaberansari, T. Berthold, Meinolf Blawat, Markus Kock, Gregor Schewior, Jens Benndorf, Frederik Kautz, Hans-Martin Bluethgen, C. Sauer
{"title":"ZuSE Ki-Avf: Application-Specific AI Processor for Intelligent Sensor Signal Processing in Autonomous Driving","authors":"Gia Bao Thieu, Sven Gesper, G. P. Vayá, C. Riggers, Oliver Renke, Till Fiedler, Jakob Marten, Tobias Stuckenberg, Holger Blume, C. Weis, Lukas Steiner, C. Sudarshan, N. Wehn, Lennart M. Reimann, R. Leupers, Michael Beyer, D. Köhler, Alisa Jauch, Jan Micha Borrmann, Setareh Jaberansari, T. Berthold, Meinolf Blawat, Markus Kock, Gregor Schewior, Jens Benndorf, Frederik Kautz, Hans-Martin Bluethgen, C. Sauer","doi":"10.23919/DATE56975.2023.10136978","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136978","url":null,"abstract":"Modern and future AI-based automotive applications, such as autonomous driving, require the efficient real-time processing of huge amounts of data from different sensors, like camera, radar, and LiDAR. In the ZuSE-KI-AVF project, multiple university, and industry partners collaborate to develop a novel massive parallel processor architecture, based on a cus-tomized RISC-V host processor, and an efficient high-performance vertical vector coprocessor. In addition, a software development framework is also provided to efficiently program AI-based sensor processing applications. The proposed processor system was verified and evaluated on a state-of-the-art UltraScale+ FPGA board, reaching a processing performance of up to 126.9 FPS, while executing the YOLO-LITE CNN on 224x224 input images. Further optimizations of the FPGA design and the realization of the processor system on a 22nm FDSOI CMOS technology are planned.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124254301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Automated Energy-Efficient DNN Compression under Fine-Grain Accuracy Constraints","authors":"Ourania Spantidi, Iraklis Anagnostopoulos","doi":"10.23919/DATE56975.2023.10136954","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136954","url":null,"abstract":"Deep Neural Networks (DNNs) are utilized in a variety of domains, and their computation intensity is stressing embedded devices that comprise limited power budgets. DNN compression has been employed to achieve gains in energy consumption on embedded devices at the cost of accuracy loss. Compression-induced accuracy degradation is addressed through fine-tuning or retraining, which can not always be feasible. Additionally, state-of-art approaches compress DNNs with respect to the average accuracy achieved during inference, which can be a misleading evaluation metric. In this work, we explore more fine-grain properties of DNN inference accuracy, and generate energy-efficient DNNs using signal temporal logic and falsification jointly through pruning and quantization. We offer the ability to control at run-time the quality of the DNN inference, and propose an automated framework that can generate compressed DNNs that satisfy tight fine-grain accuracy requirements. The conducted evaluation on the ImageNet dataset has shown over 30% in energy consumption gains when compared to baseline DNNs.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123505245","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
C. Tung, Biresh Kumar Joardar, P. Pande, J. Doppa, Hai Helen Li, K. Chakrabarty
{"title":"Dynamic Task Remapping for Reliable CNN Training on ReRAM Crossbars","authors":"C. Tung, Biresh Kumar Joardar, P. Pande, J. Doppa, Hai Helen Li, K. Chakrabarty","doi":"10.23919/DATE56975.2023.10137238","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137238","url":null,"abstract":"A ReRAM crossbar-based computing system (RCS) can accelerate CNN training. However, hardware faults due to manufacturing defects and limited endurance impede the widespread adoption of RCS. We propose a dynamic task remapping-based technique for reliable CNN training on faulty RCS. Experimental results demonstrate that the proposed low-overhead method incurs only 0.85% accuracy loss on average while training popular CNNs such as VGGs, ResNets, and SqueezeNet with the CIFAR-IO, CIFAR-100, and SVHN datasets in the presence of faults.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128396568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L. Fesquet, Rosalie Tran, Xavier Lesage, Mohamed Akrarai, Stéphane Mancini, G. Sicard
{"title":"Low-Throughput Event-Based Image Sensors and Processing","authors":"L. Fesquet, Rosalie Tran, Xavier Lesage, Mohamed Akrarai, Stéphane Mancini, G. Sicard","doi":"10.23919/DATE56975.2023.10137168","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137168","url":null,"abstract":"This paper presents new kinds of image sensors based on TFS (Time to First Spike) pixels and DVS (Dynamic Vision Sensor) pixels, which take advantage of non-uniform sampling and redundancy suppression to reduce the data throughput. The DVS pixels only detect a luminance variation, while TFS pixels quantized luminance by measuring the required time to cross a threshold. Such image sensors output requests through an Address Event Representation (AER), which helps to reduce the data stream The resulting event bitstream is composed by time, position, polarity, and magnitude information. Such a bitstream offers new possibilities for image processing such as event-by-event object tracking. In particular, we propose some processing to cluster events, filter noise and extract other useful features, such as a velocity estimation.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"83 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127024231","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accelerating Gustavson-based SpMM on Embedded FPGAs with Element-wise Parallelism and Access Pattern-aware Caches","authors":"Shiqing Li, Weichen Liu","doi":"10.23919/DATE56975.2023.10136958","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10136958","url":null,"abstract":"The Gustavson's algorithm (i.e., the row-wise product algorithm) shows its potential as the backbone algorithm for sparse matrix-matrix multiplication (SpMM) on hardware accelerators. However, it still suffers from irregular memory accesses and thus its performance is bounded by the off-chip memory traffic. Previous works mainly focus on high bandwidth memory-based architectures and are not suitable for embedded FPGAs with traditional DDR. In this work, we propose an efficient Gustavson-based SpMM accelerator on embedded FPGAs with element-wise parallelism and access pattern-aware caches. First of all, we analyze the parallelism of the Gustavson's algorithm and propose to perform the algorithm with element-wise parallelism, which reduces the idle time of processing elements caused by synchronization. Further, we show a counter-intuitive example that the traditional cache leads to worse performance. Then, we propose a novel access pattern-aware cache scheme called SpCache, which provides quick responses to reduce bank conflicts caused by irregular memory accesses and combines streaming and caching to handle requests that access ordered elements of unpredictable length. Finally, we conduct experiments on the Xilinx Zynq-UltraScale ZCU06 platform with a set of benchmarks from the SuiteSparse matrix collection. The experimental results show that the proposed design achieves an average 1.62x performance speedup compared to the baseline.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130624497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OverlaPIM: Overlap Optimization for Processing In-Memory Neural Network Acceleration","authors":"Minxuan Zhou, Xuan Wang, T. Simunic","doi":"10.23919/DATE56975.2023.10137223","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137223","url":null,"abstract":"Processing in-memory (PIM) can accelerate neural networks (NNs) for its extensive parallelism and data movement minimization. The performance of NN acceleration on PIM heavily depends on software-to-hardware mapping, which indicates the order and distribution of operations across the hardware resources. Previous works optimize the mapping problem by exploring the design space of per-layer and cross-layer data layout, achieving speedup over manually designed mappings. However, previous works do not consider computation overlapping across consecutive layers. By overlapping computation, we can process a layer before its preceding layer fully completes, decreasing the execution latency of the whole network. The mapping optimization without overlap analysis can result in sub-optimal performance. In this work, we propose OverlaPIM, a new framework that integrates the overlap analysis with the DNN mapping optimization on PIM architectures. OverlaPIM adopts several techniques to enable efficient overlap analysis and optimization for the whole network mapping on PIM architectures. We test OverlaPIM on popular DNN networks and compare the results to non-overlap optimization. Our experiments show that OverlaPIM can efficiently produce mappings that are 2.10 x to 4.11 x faster than the state-of-the-art mapping optimization framework.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"169 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132970547","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yigit Tuncel, T. Basaklar, Mackenzie M Smithyman, J. Dórea, Vinícius Nunes De Gouvêa, Younghyun Kim, Ümit Y. Ogras
{"title":"Towards Smart Cattle Farms: Automated Inspection of Cattle Health with Real-Life Data","authors":"Yigit Tuncel, T. Basaklar, Mackenzie M Smithyman, J. Dórea, Vinícius Nunes De Gouvêa, Younghyun Kim, Ümit Y. Ogras","doi":"10.23919/DATE56975.2023.10137281","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137281","url":null,"abstract":"Cattle diseases have a significant negative impact not only on the animals' welfare but also on the economic performance of the cattle industry [1], [2]. For example, Bovine Respiratory Disease is responsible for approximately 75% of the morbidity and 57% of the mortality in US feedlots, which is estimated to cost the agriculture industry about $1B annually [1], [2]. The current management practice to diagnose and select cattle for treatment is a widespread clinical scoring system called DART (Depression, Appetite, Respiration, and Temperature). DART requires manual labor and skilled personnel, which is a limiting factor due to labor-shortage in several industry sectors, including agriculture [3]. Therefore, a continuous and automated IoT solution to predict the health state of a cow is a critical tool for the cattle industry.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130477160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Lossless Sparse Temporal Coding for SNN-based Classification of Time-Continuous Signals","authors":"Johnson Loh, T. Gemmeke","doi":"10.23919/DATE56975.2023.10137112","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137112","url":null,"abstract":"Ultra-low power classification systems using spiking neural networks (SNN) promise efficient processing for mobile devices. Temporal coding represents activations in an artificial neural network (ANN) as binary signaling events in time, thereby minimizing circuit activity. Discrepancies in numeric results are inherent to common conversion schemes, as the atomic computing unit, i.e. the neuron, performs algorithmically different operations and, thus, potentially degrading SNN's quality of service (QoS). In this work, a lossless conversion method is derived in a top-down design approach for continuous time signals using electrocardiogram (ECG) classification as an example. As a result, the converted SNN achieves identical results compared to its fixed-point ANN reference. The computations, implied by proposed method, result in a novel hybrid neuron model located in between the integrate-and-fire (IF) and conventional ANN neuron, which numerical result is equivalent to the latter. Additionally, a dedicated SNN accelerator is implemented in 22 nm FDSOI CMOS suitable for continuous real-time classification. The direct comparison with an equivalent ANN counterpart shows that power reductions of $2.32times$ and area reductions of $7.22times$ are achievable without loss in QoS.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126899147","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuhan Chen, Alireza Khadem, Xin He, Nishil Talati, Tanvir Ahmed Khan, T. Mudge
{"title":"PEDAL: A Power Efficient GCN Accelerator with Multiple DAtafLows","authors":"Yuhan Chen, Alireza Khadem, Xin He, Nishil Talati, Tanvir Ahmed Khan, T. Mudge","doi":"10.23919/DATE56975.2023.10137240","DOIUrl":"https://doi.org/10.23919/DATE56975.2023.10137240","url":null,"abstract":"Graphs are ubiquitous in many application domains due to their ability to describe structural relations. Graph Convolutional Networks (GCNs) have emerged in recent years and are rapidly being adopted due to their capability to perform Machine Learning (ML) tasks on graph-structured data. GCN exhibits irregular memory accesses due to the lack of locality when accessing graph-structured data. This makes it hard for general-purpose architectures like CPUs and GPUs to fully utilize their computing resources. In this paper, we propose PEDAL, a power-efficient accelerator for GCN inference supporting multiple dataflows. PEDAL chooses the best-fit dataflow and phase ordering based on input graph characteristics and GCN algorithm, achieving both efficiency and flexibility. To achieve both high power efficiency and performance, PEDAL features a light-weight processing element design. PEDAL achieves 144.5x, 9.4x, and 2.6x speedup compared to CPU, GPU, and HyGCN, respectively, and 8856x, 1606x, 8.4x, and 1.8x better power efficiency compared to CPU, GPU, HyGCN, and EnGN, respectively.","PeriodicalId":340349,"journal":{"name":"2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123173898","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}