2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

Fault Detection and Localization for Network-on-Chips in Mixed-Criticality Systems 片上网络混合临界系统故障检测与定位

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00038

Adele Maleki, Hamidreza Ahmadian, R. Obermaisser

引用次数: 1

Data-Driven Scenario-Based Application Mapping for Heterogeneous Many-Core Systems 异构多核系统数据驱动的基于场景的应用映射

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00054

J. Spieck, S. Wildermann, T. Schwarzer, J. Teich, M. Glaß

{"title":"Data-Driven Scenario-Based Application Mapping for Heterogeneous Many-Core Systems","authors":"J. Spieck, S. Wildermann, T. Schwarzer, J. Teich, M. Glaß","doi":"10.1109/MCSoC.2019.00054","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00054","url":null,"abstract":"For applications whose workload and execution behavior significantly varies with the input, a single mapping of application tasks to a given target architecture is insufficient. A single mapping may deliver a high-quality solution for the average case but rarely exploits the specific execution behavior of concurrent tasks triggered by each input tuple. E.g., tasks with higher computational demands under certain input should be mapped onto high-performance resources of the heterogeneous architecture. This necessitates mappings that are specialized for specific input data. Yet, due to the large size of input combinations, determining a separate optimized mapping for each individual input workload is not feasible for most applications. As a remedy, we propose to group input data with similar execution characteristics into a selected, small number of so-called workload scenarios for which we supply optimized mappings. In this paper, we provide a data-driven approach for detecting workload scenarios and exploring scenario-optimized mappings based on a collection of input data. The identification of scenarios and the determination of optimized mappings are interdependent: For the data-driven identification of workload scenarios, we have to measure the profiles when executing the application with the given input data for different application mappings. However, to come up with scenario-optimized application mappings, the workload scenarios have to be known. We tackle this interdependence problem by proposing a cyclic design methodology that optimizes both aspects in an iterative fashion. It is shown that with our approach, the latency of two exemplary applications, a ray tracing as well as an image stitching application, can be significantly improved compared to methods that ignore workload scenarios or do not perform the proposed iterative refinement. Furthermore, we demonstrate that our proposal can be used in the context of a hybrid application mapping methodology.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"115992756","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

A Cloud Based Super-Optimization Method to Parallelize the Sequential Code's Nested Loops 一种基于云的并行化顺序代码嵌套循环的超级优化方法

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00047

Amin Majd, Mohammad Loni, Golnaz Sahebi, M. Daneshtalab, E. Troubitsyna

{"title":"A Cloud Based Super-Optimization Method to Parallelize the Sequential Code's Nested Loops","authors":"Amin Majd, Mohammad Loni, Golnaz Sahebi, M. Daneshtalab, E. Troubitsyna","doi":"10.1109/MCSoC.2019.00047","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00047","url":null,"abstract":"Advances in hardware architecture regarding multi-core processors make parallel computing ubiquitous. To achieve the maximum utilization of multi-core processors, parallel programming techniques are required. However, there are several challenges standing in front of parallel programming. These problems are mainly divided into three major groups. First, although recent advancements in parallel programming languages (e.g. MPI, OpenCL, etc.) assist developers, still parallel programming is not desirable for most programmers. The second one belongs to the massive volume of old software and applications, which have been written in serial mode. However, converting millions of line of serial codes to parallel codes is highly time-consuming and requiring huge verification effort. Third, the production of software and applications in parallel mode is very expensive since it needs knowledge and expertise. Super-optimization provided by super compilers is the process of automatically determine the dependent and independent instructions to find any data dependency and loop-free sequence of instructions. Super compiler then runs these instructions on different processors in the parallel mode, if it is possible. Super-optimization is a feasible solution for helping the programmer to get relaxed from parallel programming workload. Since the most complexity of the sequential codes is in the nested loops, we try to parallelize the nested loops by using the idea of super-optimization. One of the underlying stages in the super-optimization is scheduling tiled space for iterating nested loops. Since the problem is NP-Hard, using the traditional optimization methods are not feasible. In this paper, we propose a cloud-based super-optimization method as Software-as-a-Service (SaaS) to reduce the cost of parallel programming. In addition, it increases the utilization of the processing capacity of the multi-core processor. As the result, an intermediate programmer can use the whole processing capacity of his/her system without knowing anything about writing parallel codes or super compiler functions by sending the serial code to a cloud server and receiving the parallel version of the code from the cloud server. In this paper, an evolutionary algorithm is leveraged to solve the scheduling problem of tiles. Our proposed super-optimization method will serve as software and provided as a hybrid (public and private) deployment model.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"61 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128589525","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Real-Time Attitude Estimation of Sigma-Point Kalman Filter via Matrix Operation Accelerator 基于矩阵运算加速器的Sigma-Point卡尔曼滤波器实时姿态估计

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00055

Zeyang Dai, Lei Jing

引用次数: 4

Fault-Tolerant Traffic-Aware Routing Algorithm for 3-D Photonic Networks-on-Chip 片上三维光子网络容错流量感知路由算法

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00032

M. Meyer, Yu Wang, Takahiro Watanabe

引用次数: 3

Exploiting Model-Level Parallelism in Recurrent Neural Network Accelerators 循环神经网络加速器中模型级并行性的开发

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00042

Lu Peng, Wentao Shi, Jian Zhang, Samuel Irving

引用次数: 7

Design-Time Memory Subsystem Optimization for Low-Power Multi-Core Embedded Systems 低功耗多核嵌入式系统的设计时内存子系统优化

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00056

Manuel Strobel, M. Radetzki

{"title":"Design-Time Memory Subsystem Optimization for Low-Power Multi-Core Embedded Systems","authors":"Manuel Strobel, M. Radetzki","doi":"10.1109/MCSoC.2019.00056","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00056","url":null,"abstract":"Embedded multi-core systems are increasingly in use. As established single-core design methodologies are often not applicable out of the box, novel design-time optimization methods are required in order to manage real-time characteristics, predictability, or tight constraints with respect to energy consumption or system performance. With focus on the memory subsystem in a multi-core embedded system, this paper proposes an optimization workflow for the application-specific optimal binding of code and data to memory instances, efficient handling and scheduling of available memory low-power modes, and the automated and transparent integration of these optimization results on the software level. Presented optimization algorithms are realized as integer linear programs; code modification and generation are implemented on the basis of LLVM. Experimental results for an ARM-based quad-core platform with SRAM memory subsystem, consisting of core-local scratchpad memories and global shared memory, prove the efficiency of our method in terms of energy consumption when compared to a system using direct-mapped caches, but also in comparison with a state-of-the-art scratchpad mapping heuristic.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125582181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Optimization of Numerous Small Dense-Matrix–Vector Multiplications in H-Matrix Arithmetic on GPU GPU上h -矩阵算法中大量小密度矩阵向量乘法的优化

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00009

S. Ohshima, I. Yamazaki, Akihiro Ida, Rio Yokota

{"title":"Optimization of Numerous Small Dense-Matrix–Vector Multiplications in H-Matrix Arithmetic on GPU","authors":"S. Ohshima, I. Yamazaki, Akihiro Ida, Rio Yokota","doi":"10.1109/MCSoC.2019.00009","DOIUrl":"https://doi.org/10.1109/MCSoC.2019.00009","url":null,"abstract":"Dense-matrix–vector multiplication is one of the well-known important matrix calculations. This calculation is provided a general matrix–vector multiplication (GEMV) function in the basic linear algebra subprograms (BLAS) libraries for several computation hardware. Traditionally, studies focus one large dense-matrix (the length of each side of the dense matrix is long)–vector multiplication. However, some applications require acceleration of numerous small dense-matrix–vector multiplications. This feature is provided by batched BLAS libraries. This calculation is also needed to compute a hierarchical-matrix–vector multiplication. In this study, we implemented numerous small dense-matrix–vector multiplications on a Pascal GPU and evaluated the performance. Thus, we considered the impact of optimization parameters and succeeded in obtaining a better performance than previous works. The maximum differences from our previous work is 28.47% and from batched GEMV of MAGMA BLAS is upto 81.81%. Moreover, we considered the use of two optimization parameters in one GPU kernel; one parameter was applied to some matrices, whereas the second parameter was applied to other matrices. The amount of the improvement was limited (upto 5%), a performance improvement was achieved. Our result will serve as a good reference for users who need to use numerous small dense-matrix–vector multiplications on a GPU and want to optimize a matrix–vector multiplication by hand-tuning and auto-tuning.","PeriodicalId":104240,"journal":{"name":"2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"51 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126493087","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 4

Algorithm to Determine Extended Edit Distance between Program Codes 确定程序代码之间扩展编辑距离的算法

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00033

Kazuki Anzai, Y. Watanobe

引用次数: 6

Automatic Generation of Fill-in-the-Blank Programming Problems 填空编程问题的自动生成

2019 IEEE 13th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2019-10-01 DOI: 10.1109/MCSoC.2019.00034

Kenta Terada, Y. Watanobe

引用次数: 10