2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献_第2页

Integrating Intra-and Intercellular Simulation of a 2D HL-1 Cardiac Model Based on Embedded GPUs 基于嵌入式gpu的二维HL-1心脏模型胞内胞间集成仿真

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00041

Baohua Liu, W. Shen, Xin Zhu, Xingyu Wangchen

引用次数: 0

MITRACA: A Next-Gen Heterogeneous Architecture MITRACA:下一代异构架构

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00050

Riadh Ben Abdelhamid, Y. Yamaguchi, T. Boku

{"title":"MITRACA: A Next-Gen Heterogeneous Architecture","authors":"Riadh Ben Abdelhamid, Y. Yamaguchi, T. Boku","doi":"10.1109/MCSoC.2019.00050","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00050","url":null,"abstract":"GPU (Graphics Processing Unit) and CPU (Central Processing Unit) possess a sufficient and appropriate performance to compute massively parallel applications like AI, Big data, and material sciences. However, their real performance is far lower than those theoretical ones. The primary reason for the performance degradation is that they suffer from limited memory bandwidth and inefficient interconnection topology not optimized for these types of applications. Thus, from the viewpoint of real computational performance called computational efficiency, FPGA (Field Programmable Gate Array) is now becoming an attractive chip for these types of applications with massively parallel computation. FPGA can efficiently propose optimized communication and bridge different computing accelerators as customized hardware. In other words, FPGA-based hardware accelerators offer a convenient solution for both high performance and high memory bandwidth. However, one serious concern is usability. For example, the FPGA design using hardware description language is a meticulous task and requires specialized skill sets as well as a long time to market. An overlay architecture will become an appropriate candidate that can resolve this issue because it offers a software layer that simplifies FPGA programmability by abstracting the fabric resources. Thus, this article proposes an overlay architecture based on a tightly-connected many-core-based CGRA (Coarse-Grained Reconfigurable Architecture). It will help software engineers on seamlessly implementing their applications. Our final goal is not on the current fine-grained FPGAs but new middle-to-course-grained programmable chips. If an ASIC (Application-Specific Integrated Circuit) implementation was adopted, the performance would achieve at least ten times higher compared with the current FPGA implementation because of the working frequency. In this article, the proposed overlay system provides a programmable interface that virtualizes FPGA resources and let prospected users focus on high-level software programming.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132295681","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

A Novel SLM-Based Virtual FPGA Overlay Architecture 一种新的基于slm的虚拟FPGA覆盖架构

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00018

Theingi Myint, M. Amagasaki, Qian Zhao, M. Iida, M. Kiyama

{"title":"A Novel SLM-Based Virtual FPGA Overlay Architecture","authors":"Theingi Myint, M. Amagasaki, Qian Zhao, M. Iida, M. Kiyama","doi":"10.1109/MCSoC.2019.00018","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00018","url":null,"abstract":"To implement virtual field-programmable gate array (vFPGA) layers on physical devices, FPGA overlay technologies have been introduced to provide inter-FPGA bitstream compatibility. Conventional LUT-based vFPGA overlay architectures have very large resource overheads because LUT resource requirements increase as O(2k) with an increasing number of inputs, k. In this paper, we propose a novel SLM-based vFPGA overlay architectures that employ our previously proposed scalable logic module (SLM) as a logic cell. SLMs can cover most frequently used logics with far fewer hardware resources than LUTs. Evaluation results show that a 6-input SLM-based vFPGA can reduce LUT and flip-flop resource usage by up to 21% and 21% on an Artix-7 FPGA, on a Kintex-7 FPGA, and on a Kintex UltraScale+ FPGA respectively, as compared to a LUT-based vFPGA of the same input size. Similarly, a 7-input SLM-based vFPGA can reduce LUT and flip-flop resource usage by up to 32% and 35% on an Artix-7 FPGA, 30% and 35% on a Kintex-7 FPGA, and 30% and 35% on a Kintex UltraScale+ FPGA respectively, as compared to a LUT-based vFPGA of the same input size. Delay results of SLM-based vFPGA overlay architectures are almost the same with the comparison of LUTbased vFPGA overlay architectures.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"95 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"121175210","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems 用于安全关键系统强隔离的低延迟灵活TDM NoC

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00029

M. Alonso, J. Flich, M. Turki, D. Bertozzi

{"title":"A Low-Latency and Flexible TDM NoC for Strong Isolation in Security-Critical Systems","authors":"M. Alonso, J. Flich, M. Turki, D. Bertozzi","doi":"10.1109/MCSoC.2019.00029","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00029","url":null,"abstract":"Shared security-critical systems are typically organized as a set of domains that must be kept separate. The network-on-chip (NoC) is key to delivering strong domain isolation, since many of its internal resources are shared between packets from different domains; therefore time-division multiplexing (TDM) is often implemented to avoid any form of interference. Prior approaches to TDM-based scheduling of NoCs lose relevance when they are challenged with conflicting requirements of latency optimization, area efficiency, architectural flexibility and fast reconfigurability. In many cases, aggressive latency optimizations are performed at the cost of timing channel protection. In this paper, we propose a new scheduling approach of time slots in 2D-mesh TDM NoCs that follows directly from the properties of the Channel Dependency Graph. As a result, the isolation-performance trade-off is consistently improved with respect to state-of-the-art solutions across the domain configuration space. When combined with a new token-based mechanism to dispatch scheduling directives, our approach enables the effective reconfiguration of the number of domains, unlike the static nature of most previous proposals.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127193142","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

A Traffic-Robust Routing Algorithm for Network-on-Chip Systems 片上网络系统的流量鲁棒路由算法

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00037

Siying Xu, M. Meyer, Xin Jiang, Takahiro Watanabe

{"title":"A Traffic-Robust Routing Algorithm for Network-on-Chip Systems","authors":"Siying Xu, M. Meyer, Xin Jiang, Takahiro Watanabe","doi":"10.1109/MCSoC.2019.00037","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00037","url":null,"abstract":"Network-on-chip (NoC) has been proposed as a better interconnection method than the bus architecture. Recently, a large number of routing algorithms have been proposed to improve the network performance. They usually show their benefits under particular traffic patterns. However, traffic patterns are generally unknown in advance and vary according to the application due to the behavioral diversity between inter-core and memory access communications. In this paper, a local traffic pattern detecting mechanism is proposed to detect the current traffic patterns including uniform, transpose, hotspot and real workloads, and then the routing algorithm will be switched to the most suitable one according to the detection result. Experimental results show that the traffic pattern can be accurately detected. For the hotspot traffic pattern, the success rate of the detector can reach up to 100 percent when the hotspot percentage is larger than 8. With the help of the proposed traffic-robust routing algorithm, the network can always work with a more suitable routing algorithm and achieve better performance.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124064248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Enhanced ID Authentication Scheme Using FPGA-Based Ring Oscillator PUF 基于fpga的环形振荡器PUF的增强ID认证方案

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00052

Van-Toan Tran, Quang-Kien Trinh, Van‐Phuc Hoang

{"title":"Enhanced ID Authentication Scheme Using FPGA-Based Ring Oscillator PUF","authors":"Van-Toan Tran, Quang-Kien Trinh, Van‐Phuc Hoang","doi":"10.1109/MCSoC.2019.00052","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00052","url":null,"abstract":"FPGA-based ring oscillator (RO) PUF is very popular for its unique properties and easy implementation. However, the designs are normally expensive, and the RO frequency is highly sensitive to operating condition and other types of global variations. In addition, the local variations are also highly correlated, which normally requires complex the identification (ID) extraction algorithm and/or a large number of ROs. In this work, by using statistical analysis, we have experimentally shown that the RO frequencies are very sensitive to global variation factors. Fortunately, their local process variations within a die are relatively consistent regardless of the operating condition and this can be used for unique ID extraction. Furthermore, we have proposed an ID authentication scheme using FPGA-based RO PUF. Our proposed scheme allows to fully extract the local variation characteristics by using an almost technology-and vendor-agnostic PUF circuit. In addition, the ID extraction circuit is kept simple and compact, so that the overall design is area-and energy-efficient. The experimental results show a very good level of reliability (99.94 %) for a design of 32 ROs in different physical FPGAs.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"10 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127424977","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Distributed O(N) Linear Solver for Dense Symmetric Hierarchical Semi-Separable Matrices 密集对称分层半可分矩阵的分布O(N)线性求解器

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00008

Chenhan D. Yu, Severin Reiz, G. Biros

{"title":"Distributed O(N) Linear Solver for Dense Symmetric Hierarchical Semi-Separable Matrices","authors":"Chenhan D. Yu, Severin Reiz, G. Biros","doi":"10.1109/MCSoC.2019.00008","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00008","url":null,"abstract":"We present a distributed memory algorithm for the approximate hierarchical factorization of symmetric positive definite (SPD) matrices. Our method is based on the distributed memory GOFMM, an algorithm that appeared in SC18 (doi:10.1109/SC.2018.00018). GOFMM constructs a hierarchical matrix approximation of an arbitrary SPD matrix that compresses the matrix by creating low-rank approximations of the off-diagonal blocks. GOFMM method has no guarantees of success for arbitrary SPD matrices. (This is similar to the SVD; not every matrix admits a good low-rank approximation.) But for many SPD matrices, GOFMM does enable compression that results in fast matrix-vector multiplication that can reach N logN time—as opposed to N2 required for a dense matrix. GOFMM supports shared and distributed memory parallelism. In this paper, we build an approximate \"ULV\" factorization based on the Hierarchically Semi-Separable (HSS) compression of the GOFMM. This factorization requires O(N) work (given the compressed matrix) and O(N=p) + O(log p) time on p MPI processes (assuming a hypercube topology). The previous state-of-the-art required O(N logN) work. We present the factorization algorithm, discuss its complexity, and present weak and strong scaling results for the \"factorization\" and \"solve\" phases of our algorithm. We also discuss the performance of the inexact ULV factorization as a preconditioner for a few exemplary large dense linear systems. In our largest run, we were able to factorize a 67M-by-67M matrix in less than one second; and solve a system with 64 right-hand sides in less than one-tenth of a second. This run was on 6,144 Intel \"Skylake\" cores on the SKX partition of the Stampede2 system at the Texas Advanced Computing Center.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"20 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128723743","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Towards an Efficient Hardware Architecture for Odd-Even Based Merge Sorter 基于奇偶归并排序器的高效硬件结构研究

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00043

Elsayed A. Elsayed, Kenji Kise

引用次数: 8

Tumour Detection using Convolutional Neural Network on a Lightweight Multi-Core Device 基于卷积神经网络的轻型多核设备肿瘤检测

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00020

T. Teo, Weihao Tan, Y. Tan

{"title":"Tumour Detection using Convolutional Neural Network on a Lightweight Multi-Core Device","authors":"T. Teo, Weihao Tan, Y. Tan","doi":"10.1109/MCSoC.2019.00020","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00020","url":null,"abstract":"Convolutional neural networks (CNN) have been the main driving force behind image classification and it is widely used. Large amounts of processing power and computation complexity is required to mimic our human brain as in the image classification. Such complexity may result in large bulky systems. A lack of such, while possible, may result in a rather limited use case and as such constrained functional implementation. One solution is to explore the use of Multicore System on Chips (MCSoC). CNN, however, were commonly built on Graphics Processing Units (GPU) based machine. In this paper, we reduce the overall size of a CNN while retaining a satisfactory level of accuracy so that it is better suited to be deployed in an MCSoC environment. We trained a CNN model that was validated on detecting malignant tumor cells. The results show significant boost in functionality in the form of faster inference times and smaller model parameter sizes, deploying neural networks in an environment that would have otherwise seemed less practical. Efficient inference networks on lightweight systems can serve as an inexpensive and physically small alternative to existing Artificial Intelligence (AI) systems that are generally costly, bulky and power hungry.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130831317","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Efficient Search-Space Encoding for System-Level Design Space Exploration of Embedded Systems 嵌入式系统级设计空间探索的高效搜索空间编码

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00046

Valentina Richthammer, M. Glaß

{"title":"Efficient Search-Space Encoding for System-Level Design Space Exploration of Embedded Systems","authors":"Valentina Richthammer, M. Glaß","doi":"10.1109/MCSoC.2019.00046","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00046","url":null,"abstract":"For Design Space Exploration (DSE) of embedded systems as a combinatorial Multi-Objective Optimization Problem (MOP), metaheuristic optimization approaches are typically employed to determine high-quality solutions within limited optimization time. This requires the encoding of implementations from the design space in a search space which represents the available degrees of freedom for the optimization approach. Determining an encoding that ensures all design constraints are met by construction is, however, impossible for multi-/many-core DSE problems, so that the search space contains infeasible solutions. While state-of-the-art DSE techniques repair infeasible solutions, little to no attention has been paid to the efficiency of the resulting encoding w.r.t. its suitability for the employed optimization approach. Therefore, we formally define requirements for an efficient search space and analyze the drawbacks of automatically generated inefficient encodings. We furthermore present efficient search-space encodings for a state-of-the-art hybrid optimization approach suitable for a wide range of MOPs. The proposed encodings significantly reduce the required degree of repair, allowing us to introduce a feedback loop from repaired solutions in the design space to the respective encoded solutions in the efficient search space to further improve the optimization. The positive effects of the proposed efficient encoding and design-space feedback are demonstrated for system-level DSE using benchmarks from the domains of embedded many-core as well as networked automotive systems. Compared to inefficient search spaces from literature, significant enhancements in both optimization quality and time are observed. Furthermore, we propose metrics to quantify search-space efficiency which provide novel insights into the interdependence of search space and design space for multi-/many-core DSE.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130085574","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3