2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

筛选
英文 中文
Title Page III 第三页
{"title":"Title Page III","authors":"","doi":"10.1109/mcsoc.2019.00002","DOIUrl":"https://doi.org/10.1109/mcsoc.2019.00002","url":null,"abstract":"","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130422542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Message from the Chairs 来自主席的信息
Hitesh Sajnani, Chaiyong Ragkhitwetsagul, Manishankar Mondal
{"title":"Message from the Chairs","authors":"Hitesh Sajnani, Chaiyong Ragkhitwetsagul, Manishankar Mondal","doi":"10.1109/mcsoc.2019.00005","DOIUrl":"https://doi.org/10.1109/mcsoc.2019.00005","url":null,"abstract":"Software clone research is of high relevance for software engineering research and practice. Software clones are often a result of copying and pasting as an act of ad-hoc reuse by programmers, and can occur at many levels, from simple statement sequences to blocks, methods, classes, source files, subsystems, models, architectures and entire designs, and in all software artifacts (code, models, requirements or architecture documentation, etc.). While sometimes clones have a demonstrably bad influence on code quality, other studies have shown they can have beneficial effects on the code if used carefully. In this workshop, we seek to discuss new and active results from the research community. In particular, IWSC aims to bring together researchers and practitioners to evaluate the current state of research, discuss common problems, discover opportunities for collaboration, exchange ideas, and explore synergies with similarity analysis in other areas and disciplines.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"92 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131837140","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Hotspot-Pattern-Aware Routing Algorithm for Networks-on-Chip 片上网络的热点模式感知路由算法
Yaoying Luo, M. Meyer, Xin Jiang, Takahiro Watanabe
{"title":"A Hotspot-Pattern-Aware Routing Algorithm for Networks-on-Chip","authors":"Yaoying Luo, M. Meyer, Xin Jiang, Takahiro Watanabe","doi":"10.1109/MCSoC.2019.00040","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00040","url":null,"abstract":"The Networks-on-Chip (NoC) is widely accepted as an advanced on-chip system which replaces the traditional bus structure. NoC is promising as a solution for future many-core chip processor with better scalability and flexibility. Routers in NoC make the routing decision based on the routing algorithm. Many routing algorithms have been proposed to improve the performance of NoC. Some routing algorithms only have superiority under a specific traffic pattern, but they can have poor performance under other traffic patterns. Compared to uniform traffic, some complex hotspot patterns are closer to reality. Traffic-aware routing algorithms are designed to solve this problem. These traffic-aware routing algorithms commonly utilize virtual channels (VC) or routing tables to predict the future traffic distribution, which will have large power and hardware overheads that cannot be ignored. To solve these problems, a VC-free traffic-pattern-aware routing algorithm based on West-first routing and North-last routing is proposed in this paper. This algorithm contains a hotspot node and hotspot pattern detecting mechanism, which were designed to improve the performance of NoCs under different traffic patterns. A hotspot information block which has a small cost is connected to each router to deal with the hotspot information and detect the hotspot patterns. The simulation results show that routing algorithm proposed combines the advantages of the two existing routing algorithms and has better performance when considering different traffic patterns.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127627072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Performance Tuning of Tile Matrix Decomposition 瓷砖矩阵分解的性能调优
Tomohiro Suzuki
{"title":"Performance Tuning of Tile Matrix Decomposition","authors":"Tomohiro Suzuki","doi":"10.1109/MCSoC.2019.00011","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00011","url":null,"abstract":"Task parallel algorithms have attracted attention as algorithms for highly parallel architectures in recent years. The aim of such algorithms is to keep all computing resources running without stalling by executing a large number of fine-grained tasks asynchronously while observing data dependencies. The tile algorithm of matrix decomposition of dense matrices is implemented using a task parallel programming model following such an approach. In this article, we will consider how to select tile size, which is an important performance parameter.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"96 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132751342","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real-Time Implementation of Time-Space Continuous Dynamic Programming for Air-Drawn Character Recognition Using GPUs 利用gpu实时实现空绘字符识别的时-空连续动态规划
Aki Nakamura, Y. Okuyama, R. Oka
{"title":"Real-Time Implementation of Time-Space Continuous Dynamic Programming for Air-Drawn Character Recognition Using GPUs","authors":"Aki Nakamura, Y. Okuyama, R. Oka","doi":"10.1109/MCSoC.2019.00048","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00048","url":null,"abstract":"Air-drawn character recognition is one of the input methods using human body movements. Time-Space Continuous Dynamic Programming (TSCDP) is one of the algorithms that can implement such a task by detecting pre-defined trajectories from input videos. Since TSCDP requires massive computation, it is hard to make the system work in real-time with a single processor. In this paper, we investigated the frames per second (fps) requirements for the air-drawn character recognition system using TSCDP. We analyzed the dependencies among the calculations of TSCDP for the parallelization using GPUs. We evaluated the computation time with CPU and GPU for desktop and embedded environments. We confirmed that the proposed system works in real-time for real videos in both desktop and embedded environments by comparing with the fps requirements.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"58 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128217192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight Semantics-Preserving Communication for Real-Time Automotive Software 面向实时汽车软件的轻量级语义保持通信
Eugene Yip, Erjola Lalo, Gerald Lüttgen, A. Sailer
{"title":"Lightweight Semantics-Preserving Communication for Real-Time Automotive Software","authors":"Eugene Yip, Erjola Lalo, Gerald Lüttgen, A. Sailer","doi":"10.1109/MCSoC.2019.00059","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00059","url":null,"abstract":"The automotive industry is confronting the multi-core challenge, where legacy and modern software must run correctly and efficiently in parallel, by designing their software around the Logical Execution Time (LET) model. While such designs offer implementations that are platform independent and time predictable, task communications are assumed to complete instantaneously. Thus, it is critical to implement timely data transfers between LET tasks, which may be on different cores, in order to preserve a design's data-flow. In this paper, we develop a lightweight Static Buffering Protocol (SBP) that satisfies the LET communication semantics and supports signal-based communication with multiple signal writers. Our simulation-based evaluation with realistic industrial automotive benchmarks shows that the execution overhead of SBP is at most half that of the traditional Point-To-Point (PTP) communication method. Moreover, SBP needs on average 60% less buffer memory than PTP.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131022185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA 基于FPGA的软处理器TAGE分支预测器的高效实现
Katsunoshin Matsui, Md. Ashraful Islam, Kenji Kise
{"title":"An Efficient Implementation of a TAGE Branch Predictor for Soft Processors on FPGA","authors":"Katsunoshin Matsui, Md. Ashraful Islam, Kenji Kise","doi":"10.1109/MCSoC.2019.00023","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00023","url":null,"abstract":"Soft processors are becoming a common component on reconfigurable computing like FPGA. For some accelerators, custom logic functions are implemented as processing elements besides the soft processor. Since the resources in FPGA are fixed and limited, it is desired to implement the soft processor with less logical resources as possible. One of the important parts of the processor is an instruction fetch unit whose performance is dependent on branch prediction. Conventional branch predictors like bimodal or gshare are simple to implement but their prediction accuracy is not good enough. On the other hand, TAGE branch predictor has better prediction accuracy but contains complex logic path for branch prediction, which results in the lower operating frequency. In this paper, we propose a branch predictor called pTAGE, which has almost the same prediction accuracy as TAGE and avoids becoming the critical path of the processor. The branch prediction of pTAGE is pipelined, so prediction result is available on each clock cycle. We implement gshare, TAGE, and pTAGE, respectively in Verilog HDL and evaluate their operating frequency and prediction rate based on FPGA implementation. In this result, pTAGE has almost the same prediction rate as TAGE and 1.41 times higher operating frequency than that of TAGE. Also, we evaluate the performance by varying the latency for updating branch prediction, and the evaluation result shows that pTAGE exhibits higher performance in deep pipelined processors than gshare.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"6 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114606827","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Many Universal Convolution Cores for Ensemble Sparse Convolutional Neural Networks 集合稀疏卷积神经网络的通用卷积核
Ryosuke Kuramochi, Youki Sada, Masayuki Shimoda, Shimpei Sato, Hiroki Nakahara
{"title":"Many Universal Convolution Cores for Ensemble Sparse Convolutional Neural Networks","authors":"Ryosuke Kuramochi, Youki Sada, Masayuki Shimoda, Shimpei Sato, Hiroki Nakahara","doi":"10.1109/MCSoC.2019.00021","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00021","url":null,"abstract":"A convolutional neural network~(CNN) is one of the most successfully used neural networks and it is widely used for many embedded computer vision tasks. However, it requires a massive number of multiplication and accumulation (MAC) computations with high-power consumption to realize it, and higher recognition accuracy is desired for modern tasks. In the paper, we apply a sparseness technique to generate a weak classifier to build an ensemble CNN. There is a trade-off between recognition accuracy and inference speed, and we control sparse (zero weight) ratio to make an excellent performance and better recognition accuracy. We use P sparse weight CNNs with a dataflow pipeline architecture that hides the performance overhead for multiple CNN evaluation on the ensemble CNN. We set an adequate sparse ratio to adjust the number of operation cycles in each stage. The proposed ensemble CNN depends on the dataset quality and it has different layer configurations. We propose a universal convolution core to realize variations of modern convolutional operations, and extend it to many cores with pipelining architecture to achieve high-throughput operation. Therefore, while computing efficiency is poor on GPUs which is unsuitable for a sparseness convolution, on our universal convolution cores can realize an architecture with excellent pipeline efficiency. We measure the trade-off between recognition accuracy and inference speed using existing benchmark datasets and CNN models. By setting the sparsity ratio and the number of predictors appropriately, high-speed architectures are realized on the many universal covers while the recognition accuracy is improved compared to the conventional single CNN realization. We implemented the prototype of many universal convolution cores on the Xilinx Kintex UltraScale+ FPGA, and compared with the desktop GPU realization of the ensembling, the proposed many core based accelerator for the ensemble sparse CNN is 3.09 times faster, 4.20 times lower power, and 13.33 times better as for the performance per power. Therefore, by realizing the proposed ensemble method with many of universal convolution cores, a high-speed inference could be achieved while improving the recognition accuracy compared with the conventional dense weight CNN on the desktop GPU.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115681538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
FPGA/Python Co-Design for Lane Line Detection on a PYNQ-Z1 Board FPGA/Python协同设计在PYNQ-Z1板上的线路检测
Koki Honda, Kaijie Wei, H. Amano
{"title":"FPGA/Python Co-Design for Lane Line Detection on a PYNQ-Z1 Board","authors":"Koki Honda, Kaijie Wei, H. Amano","doi":"10.1109/MCSoC.2019.00015","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00015","url":null,"abstract":"This paper presents the implementation of lane line detection on FPGA and Python. Lane line detection consists of three functions, median blur, adaptive threshold, and Hough transform. We implemented only accumulation of Hough transform on FPGA. Although the Hough transform cannot be implemented on a low-end FPGA board if implemented directly, by reducing ρθ space, it was successfully implemented on a low-end FPGA board. The rest of the Hough transform was implemented using Python's NumPy and SciPy, and OpenCV. Although it was very easy to write, it did not become a bottleneck for the whole process because of its effectiveness. As a result, we could achieve a 3.9x speedup compared to OpenCV and kept the developing cost down. When implementing median blur and adaptive threshold on an FPGA, we could achieve a 6.34x speedup.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116175906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Title Page I 第一页
{"title":"Title Page I","authors":"","doi":"10.1109/mcsoc.2019.00001","DOIUrl":"https://doi.org/10.1109/mcsoc.2019.00001","url":null,"abstract":"","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"127 13","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"120818359","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信