2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献_第5页

Title Page III 第三页

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/mcsoc.2019.00002

引用次数: 0

Message from the Chairs 来自主席的信息

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/mcsoc.2019.00005

Hitesh Sajnani, Chaiyong Ragkhitwetsagul, Manishankar Mondal

引用次数: 0

A Hotspot-Pattern-Aware Routing Algorithm for Networks-on-Chip 片上网络的热点模式感知路由算法

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00040

Yaoying Luo, M. Meyer, Xin Jiang, Takahiro Watanabe

{"title":"A Hotspot-Pattern-Aware Routing Algorithm for Networks-on-Chip","authors":"Yaoying Luo, M. Meyer, Xin Jiang, Takahiro Watanabe","doi":"10.1109/MCSoC.2019.00040","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00040","url":null,"abstract":"The Networks-on-Chip (NoC) is widely accepted as an advanced on-chip system which replaces the traditional bus structure. NoC is promising as a solution for future many-core chip processor with better scalability and flexibility. Routers in NoC make the routing decision based on the routing algorithm. Many routing algorithms have been proposed to improve the performance of NoC. Some routing algorithms only have superiority under a specific traffic pattern, but they can have poor performance under other traffic patterns. Compared to uniform traffic, some complex hotspot patterns are closer to reality. Traffic-aware routing algorithms are designed to solve this problem. These traffic-aware routing algorithms commonly utilize virtual channels (VC) or routing tables to predict the future traffic distribution, which will have large power and hardware overheads that cannot be ignored. To solve these problems, a VC-free traffic-pattern-aware routing algorithm based on West-first routing and North-last routing is proposed in this paper. This algorithm contains a hotspot node and hotspot pattern detecting mechanism, which were designed to improve the performance of NoCs under different traffic patterns. A hotspot information block which has a small cost is connected to each router to deal with the hotspot information and detect the hotspot patterns. The simulation results show that routing algorithm proposed combines the advantages of the two existing routing algorithms and has better performance when considering different traffic patterns.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127627072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Performance Tuning of Tile Matrix Decomposition 瓷砖矩阵分解的性能调优

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00011

Tomohiro Suzuki

引用次数: 0

Real-Time Implementation of Time-Space Continuous Dynamic Programming for Air-Drawn Character Recognition Using GPUs 利用gpu实时实现空绘字符识别的时-空连续动态规划

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00048

Aki Nakamura, Y. Okuyama, R. Oka

引用次数: 0

Lightweight Semantics-Preserving Communication for Real-Time Automotive Software 面向实时汽车软件的轻量级语义保持通信

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00059

Eugene Yip, Erjola Lalo, Gerald Lüttgen, A. Sailer

引用次数: 2

An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA 基于FPGA的软处理器TAGE分支预测器的高效实现

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00023

Katsunoshin Matsui, Md. Ashraful Islam, Kenji Kise

{"title":"An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA","authors":"Katsunoshin Matsui, Md. Ashraful Islam, Kenji Kise","doi":"10.1109/MCSoC.2019.00023","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00023","url":null,"abstract":"Soft processors are becoming a common component on reconfigurable computing like FPGA. For some accelerators, custom logic functions are implemented as processing elements besides the soft processor. Since the resources in FPGA are fixed and limited, it is desired to implement the soft processor with less logical resources as possible. One of the important parts of the processor is an instruction fetch unit whose performance is dependent on branch prediction. Conventional branch predictors like bimodal or gshare are simple to implement but their prediction accuracy is not good enough. On the other hand, TAGE branch predictor has better prediction accuracy but contains complex logic path for branch prediction, which results in the lower operating frequency. In this paper, we propose a branch predictor called pTAGE, which has almost the same prediction accuracy as TAGE and avoids becoming the critical path of the processor. The branch prediction of pTAGE is pipelined, so prediction result is available on each clock cycle. We implement gshare, TAGE, and pTAGE, respectively in Verilog HDL and evaluate their operating frequency and prediction rate based on FPGA implementation. In this result, pTAGE has almost the same prediction rate as TAGE and 1.41 times higher operating frequency than that of TAGE. Also, we evaluate the performance by varying the latency for updating branch prediction, and the evaluation result shows that pTAGE exhibits higher performance in deep pipelined processors than gshare.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"6 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114606827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Many Universal Convolution Cores for Ensemble Sparse Convolutional Neural Networks 集合稀疏卷积神经网络的通用卷积核

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00021

Ryosuke Kuramochi, Youki Sada, Masayuki Shimoda, Shimpei Sato, Hiroki Nakahara

{"title":"Many Universal Convolution Cores for Ensemble Sparse Convolutional Neural Networks","authors":"Ryosuke Kuramochi, Youki Sada, Masayuki Shimoda, Shimpei Sato, Hiroki Nakahara","doi":"10.1109/MCSoC.2019.00021","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00021","url":null,"abstract":"A convolutional neural network~(CNN) is one of the most successfully used neural networks and it is widely used for many embedded computer vision tasks. However, it requires a massive number of multiplication and accumulation (MAC) computations with high-power consumption to realize it, and higher recognition accuracy is desired for modern tasks. In the paper, we apply a sparseness technique to generate a weak classifier to build an ensemble CNN. There is a trade-off between recognition accuracy and inference speed, and we control sparse (zero weight) ratio to make an excellent performance and better recognition accuracy. We use P sparse weight CNNs with a dataflow pipeline architecture that hides the performance overhead for multiple CNN evaluation on the ensemble CNN. We set an adequate sparse ratio to adjust the number of operation cycles in each stage. The proposed ensemble CNN depends on the dataset quality and it has different layer configurations. We propose a universal convolution core to realize variations of modern convolutional operations, and extend it to many cores with pipelining architecture to achieve high-throughput operation. Therefore, while computing efficiency is poor on GPUs which is unsuitable for a sparseness convolution, on our universal convolution cores can realize an architecture with excellent pipeline efficiency. We measure the trade-off between recognition accuracy and inference speed using existing benchmark datasets and CNN models. By setting the sparsity ratio and the number of predictors appropriately, high-speed architectures are realized on the many universal covers while the recognition accuracy is improved compared to the conventional single CNN realization. We implemented the prototype of many universal convolution cores on the Xilinx Kintex UltraScale+ FPGA, and compared with the desktop GPU realization of the ensembling, the proposed many core based accelerator for the ensemble sparse CNN is 3.09 times faster, 4.20 times lower power, and 13.33 times better as for the performance per power. Therefore, by realizing the proposed ensemble method with many of universal convolution cores, a high-speed inference could be achieved while improving the recognition accuracy compared with the conventional dense weight CNN on the desktop GPU.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115681538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

FPGA/Python Co-Design for Lane Line Detection on a PYNQ-Z1 Board FPGA/Python协同设计在PYNQ-Z1板上的线路检测

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00015

Koki Honda, Kaijie Wei, H. Amano

引用次数: 3

Title Page I 第一页

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/mcsoc.2019.00001

引用次数: 0