2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

筛选
英文 中文
Fault Detection and Localization for Network-on-Chips in Mixed-Criticality Systems 片上网络混合临界系统故障检测与定位
Adele Maleki, Hamidreza Ahmadian, R. Obermaisser
{"title":"Fault Detection and Localization for Network-on-Chips in Mixed-Criticality Systems","authors":"Adele Maleki, Hamidreza Ahmadian, R. Obermaisser","doi":"10.1109/MCSoC.2019.00038","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00038","url":null,"abstract":"The increasing trend towards mixed-critical systems, in which applications with different levels of criticality coexist and interact on the same platform, calls for fault-tolerant hardware platforms. On the other hand, due to the demanded performance in such systems, network-on-chips are employed to interconnect several computation resources. Consequently, the detection and localization of faults in the communication and computation resources becomes a challenge, if a high number of shared resources (e.g., routers, physical links) are used. This paper proposes a new hardware architecture for run-time fault detection and localization in mixed-criticality networks-on-chips. The proposed architecture detects the transient and permanent faults in the network and distinguishes between faults of different resources. The fault detection and localization mechanisms have been evaluated using Gem5 simulation and example scenarios.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"106 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123105385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Data-Driven Scenario-Based Application Mapping for Heterogeneous Many-Core Systems 异构多核系统数据驱动的基于场景的应用映射
J. Spieck, S. Wildermann, T. Schwarzer, J. Teich, M. Glaß
{"title":"Data-Driven Scenario-Based Application Mapping for Heterogeneous Many-Core Systems","authors":"J. Spieck, S. Wildermann, T. Schwarzer, J. Teich, M. Glaß","doi":"10.1109/MCSoC.2019.00054","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00054","url":null,"abstract":"For applications whose workload and execution behavior significantly varies with the input, a single mapping of application tasks to a given target architecture is insufficient. A single mapping may deliver a high-quality solution for the average case but rarely exploits the specific execution behavior of concurrent tasks triggered by each input tuple. E.g., tasks with higher computational demands under certain input should be mapped onto high-performance resources of the heterogeneous architecture. This necessitates mappings that are specialized for specific input data. Yet, due to the large size of input combinations, determining a separate optimized mapping for each individual input workload is not feasible for most applications. As a remedy, we propose to group input data with similar execution characteristics into a selected, small number of so-called workload scenarios for which we supply optimized mappings. In this paper, we provide a data-driven approach for detecting workload scenarios and exploring scenario-optimized mappings based on a collection of input data. The identification of scenarios and the determination of optimized mappings are interdependent: For the data-driven identification of workload scenarios, we have to measure the profiles when executing the application with the given input data for different application mappings. However, to come up with scenario-optimized application mappings, the workload scenarios have to be known. We tackle this interdependence problem by proposing a cyclic design methodology that optimizes both aspects in an iterative fashion. It is shown that with our approach, the latency of two exemplary applications, a ray tracing as well as an image stitching application, can be significantly improved compared to methods that ignore workload scenarios or do not perform the proposed iterative refinement. Furthermore, we demonstrate that our proposal can be used in the context of a hybrid application mapping methodology.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115992756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Cloud Based Super-Optimization Method to Parallelize the Sequential Code's Nested Loops 一种基于云的并行化顺序代码嵌套循环的超级优化方法
Amin Majd, Mohammad Loni, Golnaz Sahebi, M. Daneshtalab, E. Troubitsyna
{"title":"A Cloud Based Super-Optimization Method to Parallelize the Sequential Code's Nested Loops","authors":"Amin Majd, Mohammad Loni, Golnaz Sahebi, M. Daneshtalab, E. Troubitsyna","doi":"10.1109/MCSoC.2019.00047","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00047","url":null,"abstract":"Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128589525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Real-Time Attitude Estimation of Sigma-Point Kalman Filter via Matrix Operation Accelerator 基于矩阵运算加速器的Sigma-Point卡尔曼滤波器实时姿态估计
Zeyang Dai, Lei Jing
{"title":"Real-Time Attitude Estimation of Sigma-Point Kalman Filter via Matrix Operation Accelerator","authors":"Zeyang Dai, Lei Jing","doi":"10.1109/MCSoC.2019.00055","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00055","url":null,"abstract":"Attitude estimation is an important part for navigation of mobile robotics and unmanned aerial vehicle (UAV) control. Although the Extended Kalman Filter (EKF) can be done typically, the trend is to use Sigma-Point Kalman Filter (SPKF) instead due to its higher accuracy and robustness in harsh environment. The only drawback of such system is the higher computation cost. In order to accelerate the system, most approaches based on Field Programmable Gate Arrays (FPGA) are proposed in the past but too specific, which is not reusable and the high price for design complexity. With looking for re-usability, we present an IP core called matrix operation accelerator in this paper. Moreover, we do the verification on Zynq-7020, the experimental result shows that the proposed scheme can reduce about 50% computing time and save silicon as well.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132664935","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Fault-Tolerant Traffic-Aware Routing Algorithm for 3-D Photonic Networks-on-Chip 片上三维光子网络容错流量感知路由算法
M. Meyer, Yu Wang, Takahiro Watanabe
{"title":"Fault-Tolerant Traffic-Aware Routing Algorithm for 3-D Photonic Networks-on-Chip","authors":"M. Meyer, Yu Wang, Takahiro Watanabe","doi":"10.1109/MCSoC.2019.00032","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00032","url":null,"abstract":"As the number of cores on a single chip increased, the inter-core communication system quickly became the performance bottleneck. In order to solve the performance and scalability issues of bus-based systems, Network-on-chip (NoC) was proposed. This eventually met its own bottleneck and several technologies sprouted out from NoC research. The most commonly researched upgrade to NoCs was 3D NoCs, which utilized stacked routers to reduce the maximum hop count. Other researchers have looked at alternative transmission mediums, such as photonics. These technologies can be combined to give great performance and power benefits but can be slowed down by congestion in their path-setup phase. In order to solve this issue, we propose a traffic-aware routing algorithm that can evenly distribute the traffic throughout the chip, all while simultaneously avoiding faulty nodes. The results show that the proposed algorithm was successful in balancing the load across the chip and that the performance costs of the algorithm were mostly offset by the benefits of reducing blocked paths.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132485565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators 循环神经网络加速器中模型级并行性的开发
Lu Peng, Wentao Shi, Jian Zhang, Samuel Irving
{"title":"Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators","authors":"Lu Peng, Wentao Shi, Jian Zhang, Samuel Irving","doi":"10.1109/MCSoC.2019.00042","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00042","url":null,"abstract":"Recurrent Neural Networks (RNNs) have continued to facilitate rapid progress in a variety of academic and industrial fields, though their complexity continues to make efficient deployment difficult; when the RNN model size is not properly matched to hardware resources, performance can suffer from hardware under-utilization. In this work, we propose to explore model-level parallelism for LSTM-RNN accelerators in different levels of the model using a multicore design. The multi-core design proposed in this work operates in three computing modes: multi-programming mode in which independent models are executed; multithreading mode in which parallelism among layers of an LSTM model is explored and properly scheduled; and helper-core mode in which cores collaborate on a single LSTM layer in a lower model level comparing with multithread mode. Our design can achieve up to 1.98x speedup in \"multi-programming\" mode, a 1.91x speedup in \"multithreading\" mode and a 1.88x speedup in \"helper-core\" mode over the single-core design.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125504626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Design-Time Memory Subsystem Optimization for Low-Power Multi-Core Embedded Systems 低功耗多核嵌入式系统的设计时内存子系统优化
Manuel Strobel, M. Radetzki
{"title":"Design-Time Memory Subsystem Optimization for Low-Power Multi-Core Embedded Systems","authors":"Manuel Strobel, M. Radetzki","doi":"10.1109/MCSoC.2019.00056","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00056","url":null,"abstract":"Embedded multi-core systems are increasingly in use. As established single-core design methodologies are often not applicable out of the box, novel design-time optimization methods are required in order to manage real-time characteristics, predictability, or tight constraints with respect to energy consumption or system performance. With focus on the memory subsystem in a multi-core embedded system, this paper proposes an optimization workflow for the application-specific optimal binding of code and data to memory instances, efficient handling and scheduling of available memory low-power modes, and the automated and transparent integration of these optimization results on the software level. Presented optimization algorithms are realized as integer linear programs; code modification and generation are implemented on the basis of LLVM. Experimental results for an ARM-based quad-core platform with SRAM memory subsystem, consisting of core-local scratchpad memories and global shared memory, prove the efficiency of our method in terms of energy consumption when compared to a system using direct-mapped caches, but also in comparison with a state-of-the-art scratchpad mapping heuristic.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125582181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Optimization of Numerous Small Dense-Matrix–Vector Multiplications in H-Matrix Arithmetic on GPU GPU上h -矩阵算法中大量小密度矩阵向量乘法的优化
S. Ohshima, I. Yamazaki, Akihiro Ida, Rio Yokota
{"title":"Optimization of Numerous Small Dense-Matrix–Vector Multiplications in H-Matrix Arithmetic on GPU","authors":"S. Ohshima, I. Yamazaki, Akihiro Ida, Rio Yokota","doi":"10.1109/MCSoC.2019.00009","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00009","url":null,"abstract":"Dense-matrix–vector multiplication is one of the well-known important matrix calculations. This calculation is provided a general matrix–vector multiplication (GEMV) function in the basic linear algebra subprograms (BLAS) libraries for several computation hardware. Traditionally, studies focus one large dense-matrix (the length of each side of the dense matrix is long)–vector multiplication. However, some applications require acceleration of numerous small dense-matrix–vector multiplications. This feature is provided by batched BLAS libraries. This calculation is also needed to compute a hierarchical-matrix–vector multiplication. In this study, we implemented numerous small dense-matrix–vector multiplications on a Pascal GPU and evaluated the performance. Thus, we considered the impact of optimization parameters and succeeded in obtaining a better performance than previous works. The maximum differences from our previous work is 28.47% and from batched GEMV of MAGMA BLAS is upto 81.81%. Moreover, we considered the use of two optimization parameters in one GPU kernel; one parameter was applied to some matrices, whereas the second parameter was applied to other matrices. The amount of the improvement was limited (upto 5%), a performance improvement was achieved. Our result will serve as a good reference for users who need to use numerous small dense-matrix–vector multiplications on a GPU and want to optimize a matrix–vector multiplication by hand-tuning and auto-tuning.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126493087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Algorithm to Determine Extended Edit Distance between Program Codes 确定程序代码之间扩展编辑距离的算法
Kazuki Anzai, Y. Watanobe
{"title":"Algorithm to Determine Extended Edit Distance between Program Codes","authors":"Kazuki Anzai, Y. Watanobe","doi":"10.1109/MCSoC.2019.00033","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00033","url":null,"abstract":"An algorithm to determine the extended edit distance between program codes is presented. In addition to the conventional Levenshtein distance, the extended edit distance considers some common operations to a program code to find similar programs more accurately. To calculate the distance, the algorithm employs dynamic programming techniques as well as an algorithm for solving the minimum cost flow on a bipartite graph. In this paper, details of the algorithm and experimental results are presented. These experiments were conducted with source code submitted to an online judge system, where a number of source codes for each programming problem are located. The results show that the proposed algorithm can find source code that cannot be found by the conventional Levenshtein distance, with a higher probability.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"6 15","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"113932060","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Automatic Generation of Fill-in-the-Blank Programming Problems 填空编程问题的自动生成
Kenta Terada, Y. Watanobe
{"title":"Automatic Generation of Fill-in-the-Blank Programming Problems","authors":"Kenta Terada, Y. Watanobe","doi":"10.1109/MCSoC.2019.00034","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00034","url":null,"abstract":"In solving programming problems, it is difficult for beginners to create program code from scratch. One way to navigate this difficulty is to provide a programming problem to them which takes a fill-in-the-blank format. In this work, we propose a method to automatically generate programming problems that has two key constituents, selection of exemplary source code and selection of places to be blanks. In terms of selecting exemplary source code, k-means clustering with silhouette analysis in the Online Judge System (OJ) is proposed. Regarding the selection of places to be blanks, a model based on a bidirectional Long Short-Term Memory Network (Bi-LSTM) with a sequential Conditional Random Field (CRF) is proposed. We discuss evaluation of the proposed approach in the context of how fill-in-the-blank programming problems are generated.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"202 ","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114051797","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信