2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献_第4页

Host Bypassing: Direct Data Piping from the Network to the Hardware Accelerator 主机旁路:直接数据管道从网络到硬件加速器

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00012

Ralf Kundel, Kadir Eryigit, Jonas Markussen, C. Griwodz, Osama Abboud, Rhaban Hark, R. Steinmetz

引用次数: 2

Parallel Implementation of CNN on Multi-FPGA Cluster CNN在多fpga集群上的并行实现

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00019

Yasuyu Fukushima, Kensuke Iizuka, H. Amano

{"title":"Parallel Implementation of CNN on Multi-FPGA Cluster","authors":"Yasuyu Fukushima, Kensuke Iizuka, H. Amano","doi":"10.1109/MCSoC51149.2021.00019","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00019","url":null,"abstract":"We developed a PYNQ cluster called M-KUBOS that consists of economical Zynq boards that are interconnected through low-cost high-performance GTH serial links. For the software environment, we employed the PYNQ open-source software platform. The PYNQ cluster is anticipated to be a multi-access edge computing (MEC) server for 5G mobile networks. We implemented the ResNet-50 inference accelerator on the PYNQ cluster for image recognition of MEC applications. By estimating the execution time of each ResNet-50 layer, layers of ResNet-50 were divided into four boards so that the execution time of each board would be as equal as possible for efficient pipeline processing. Owing to the PYNQ cluster in which FPGAs were directly connected by high-speed serial links, stream processing without network bottlenecks and pipeline processing between boards were readily realized. The implementation achieved 292 GOPS performance, 75.1 FPS throughput, and 5.15 GOPS/W power efficiency. It achieved 17 times faster speed and 86 times more power efficiency compared to the implementation on the CPU, and 3.8 times more power efficiency compared to the implementation on the GPU.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"125085421","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Sparse Matrix Ordering Method with a Quantum Annealing Approach and its Parameter Tuning 基于量子退火的稀疏矩阵排序方法及其参数整定

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00045

Tomoko Komiyama, Tomohiro Suzuki

{"title":"Sparse Matrix Ordering Method with a Quantum Annealing Approach and its Parameter Tuning","authors":"Tomoko Komiyama, Tomohiro Suzuki","doi":"10.1109/MCSoC51149.2021.00045","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00045","url":null,"abstract":"Quantum annealing realizes quantum computers specialized for combinatorial optimization problems (COPs). A COP is formulated as a Hamiltonian, and quantum annealing obtains a solution by finding the ground state of the Hamiltonian. The ease of finding a solution depends on the weights assigned to the cost and constraint functions when formulating the problem. In other words, parameter tuning is essential in solving problems with quantum annealing. In the present paper, the problem of searching an ordering that reduces the fill-in for a sparse direct solver is formulated as a Hamiltonian, and quantum annealing finds the solution to this problem. We discuss the necessity and effectiveness of parameter tuning for solving COPs with quantum annealing. The results after weight tuning show that we can improve the rate of an optimal solution obtained by a maximum of 94% for ${5,times ,5}$ matrices, 68% for ${6,times ,6}$ matrices, and 27% for ${7,times ,7}$ matrices. Moreover, it is shown that giving high weights to the constraints we want to satisfy will not provide an optimal solution.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"28 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132030740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Ising-Based Combinatorial Clustering Using the Kernel Method 基于ising的核方法组合聚类

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00037

Masahito Kumagai, K. Komatsu, Masayuki Sato, Hiroaki Kobayashi

{"title":"Ising-Based Combinatorial Clustering Using the Kernel Method","authors":"Masahito Kumagai, K. Komatsu, Masayuki Sato, Hiroaki Kobayashi","doi":"10.1109/MCSoC51149.2021.00037","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00037","url":null,"abstract":"Combinatorial clustering based on the Ising model is getting attention as a method to obtain high-quality clustering results. Furthermore, combinatorial clustering using the kernel method can handle any irregular data type by using a kernel trick. The kernel trick is an approach to the extension of the data to an arbitrary high-dimensional feature space by switching the kernel function. However, the conventional kernel clustering based on the Ising model can only be used in the limited case where the number of clusters is two. This is because the Ising model is composed of decision variables that represent binary values. This paper proposes Ising-based combinatorial clustering using a kernel method that can handle two or more clusters. The key idea of the proposed method is to represent clustering results using one-hot encoding. One-hot encoding represents a cluster to which a single data belongs by using bits whose number is the same as that of clusters. However, the one-hot constraint caused by the use of one-hot encoding decreases the quality of clustering. To this problem, in this paper, combinatorial clustering based on an externally defined one-hot constraint is used. The proposed kernel-based combinatorial clustering works with more than two clusters. Therefore, the proposed method is compared with Euclidean distance-based combinatorial clustering that divides the data into two or more clusters as the conventional method. Through experiments, it is clarified that the quality of the clustering results of the proposed method for irregular data is significantly better than that of the conventional method.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"55 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131654561","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Task-level Redundancy vs Instruction-level Redundancy against Single Event Upsets in Real-time DAG scheduling 实时DAG调度中针对单事件中断的任务级冗余vs指令级冗余

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00062

L. Miedema, Benjamin Rouxel, C. Grelck

{"title":"Task-level Redundancy vs Instruction-level Redundancy against Single Event Upsets in Real-time DAG scheduling","authors":"L. Miedema, Benjamin Rouxel, C. Grelck","doi":"10.1109/MCSoC51149.2021.00062","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00062","url":null,"abstract":"Real-time cyber-physical systems have become ubiquitous. As such systems are often mission-critical, designers must include mitigations against various types of hardware faults, including Single Event Upsets (SEU). SEUs can be mitigated using both software and hardware approaches. When using software approaches, the application designer needs to select the appropriate redundancy level for the application. We propose the use of task-level redundancy for SEU detection, aiming at applications structured as a Directed Acyclic Graph (DAG) of tasks. This work compares existing instruction-level redundancy against task-level redundancy using the UPPAAL model checking tool in SMC mode. Our comparison shows that task-level redundancy implemented using Dual Modular Spatial Redundancy and Checkpoint-Restart offers significantly lower deadline miss ratios when slack is limited. While task-level redundancy usually performs better or equal, we also show that rare cases exist where long running DAG application benefit more from instruction-level redundancy.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132451203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SIMD Parallel Execution on GPU from High-Level Dataflow Synthesis 基于高级数据流合成的GPU上SIMD并行执行

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00017

Aurelien Bloch, S. Brunet, M. Mattavelli

{"title":"SIMD Parallel Execution on GPU from High-Level Dataflow Synthesis","authors":"Aurelien Bloch, S. Brunet, M. Mattavelli","doi":"10.1109/MCSoC51149.2021.00017","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00017","url":null,"abstract":"Writing and optimizing application software for heterogeneous platforms including GPU units is a very difficult task that requires designer efforts and resources to consider several key elements to obtain good performance. Dataflow programming has shown to be a good approach for accomplishing such a difficult task for its properties of portability and the possibility of arbitrary partitioning a dataflow network on each unit of heterogeneous platforms. However, such a design methodology is not sufficient by itself to obtain good performance. The paper describes some methodological steps for improving the performance of dataflow programs written in RVC-CAL and synthesized to execute on heterogeneous CPU/GPU co-processing platforms. The steps do include the optimization of the performance of the communication tasks between processing elements, a strategy for the efficient scheduling of independent GPU partitions, and the introduction of dynamic programming for leveraging the SIMD nature of GPU platforms. The approach is validated qualitatively and quantitatively using dataflow application program examples executed by applying several partitioning configurations.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"146 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116617026","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Performance Comparision of TPU, GPU, CPU on Google Colaboratory Over Distributed Deep Learning TPU、GPU、CPU在Google协作分布式深度学习上的性能比较

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00053

H. Kimm, Incheon Paik, Hanke Kimm

{"title":"Performance Comparision of TPU, GPU, CPU on Google Colaboratory Over Distributed Deep Learning","authors":"H. Kimm, Incheon Paik, Hanke Kimm","doi":"10.1109/MCSoC51149.2021.00053","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00053","url":null,"abstract":"Deep Learning models need massive amounts compute powers and tend to improve performance running on special purpose processors accelerators designed to speed up compute-intensive applications. The accelerators like Tensor Processing Units (TPUs) and Graphics Processing Units (GPUs) are widely used as deep learning hardware platforms which can often achieve better performance than CPUs, with their massive parallel execution resources and high memory bandwidth. Google Colaboratory known as Colab is a cloud service based on Jupyter Notebook that allows the users to write and execute mostly Python in a browser and admits free access to TPUs and GPUs without extra configuration need, which are widely available cloud hardware platforms. In this paper, we present a through comparison of the hardware platforms on Google Colab that is benchmarked with Distributed Bidirectional Long Short-Term Memory (dBLSTM) models upon the number of layers, the number of units each layer, and the numbers of input and output units the datasets. Human Activity Recognition (HAR) data from UCI machine-learning library have been applied to the proposed distributed bidirectional LSTM model to find the performance, strengths, bottlenecks of the hardware platforms of TPU, GPU and CPU upon hyperparameters, execution time, and evaluation metrics: accuracy, precision, recall and F1 score.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127241552","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Distributed Neural Network with TensorFlow on Human Activity Recognition Over Multicore TPU 基于TensorFlow的分布式神经网络在多核TPU上的人体活动识别

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00026

H. Kimm, Incheon Paik

引用次数: 0

Multiport Register File Design for High-Performance Embedded Cores 高性能嵌入式核的多端口寄存器文件设计

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00048

J. Kadomoto, H. Irie, S. Sakai

引用次数: 1

The Role of Linear Discriminant Analysis for Accurate Prediction of Breast Cancer 线性判别分析在乳腺癌准确预测中的作用

2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC) Pub Date : 2021-12-01 DOI: 10.1109/MCSoC51149.2021.00057

Egwom Onyinyechi Jessica, Mohamed Hamada, S. Yusuf, Mohammed Hassan

{"title":"The Role of Linear Discriminant Analysis for Accurate Prediction of Breast Cancer","authors":"Egwom Onyinyechi Jessica, Mohamed Hamada, S. Yusuf, Mohammed Hassan","doi":"10.1109/MCSoC51149.2021.00057","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00057","url":null,"abstract":"With the recent advances in clinical technologies, a huge amount of data has been accumulated for breast cancer diagnosis. Extracting information from the data to support the clinical diagnosis of breast cancer is a tedious and time-consuming task. The use of machine learning and data mining techniques has significantly changed the whole process of a breast cancer diagnosis. In this research, a prediction model for breast cancer prediction has been developed using features extracted from individual medical screening and tests. To overcome the problem of overfitting and obtain a good prediction accuracy, a Linear Discriminant Analysis (LDA) is applied for the extraction of useful features. This is done to reduce the number of features in the experimental dataset. The proposed model can create new features from the existing features and then get rid of the original features. The newly created features were able to summarize the necessary information contained initially in the original set of features. LDA was chosen because of its usefulness in detecting whether a set of features is worthwhile in predicting breast cancer. In addition to LDA, the proposed model uses Support Vector Machine (SVM) for accurate prediction, hence the name LDA-SVM prediction model. Based on 5-fold cross-validation, the proposed model yields an accuracy of 99.2%, precision of 98.0%, and Recall of 99.0% when tested on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset from the University of California- Irvine machine learning repository. Therefore, SVM shows high efficiency in handling classification problems when combined with feature extraction techniques.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127806169","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5