2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)最新文献

筛选
英文 中文
Portability of Vectorization-aware Performance Tuning Expertise across System Generations 跨系统代的矢量感知性能调优专业知识的可移植性
Shunpei Sugawara, Yoichi Shimomura, Ryusuke Egawa, H. Takizawa
{"title":"Portability of Vectorization-aware Performance Tuning Expertise across System Generations","authors":"Shunpei Sugawara, Yoichi Shimomura, Ryusuke Egawa, H. Takizawa","doi":"10.1109/MCSoC51149.2021.00043","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00043","url":null,"abstract":"Even HPC expert programmers need to invest considerable time and effort in empirically establishing effective performance tuning strategies for their target systems. When the target system is changed and/or updated, it is thus preferable for expert programmers if their performance tuning expertise can be ported to the new system as much as possible. In this paper, we focus on multiple generations of NEC SX series vector systems. We have documented the performance tuning expertise for the previous generations and built a machine-usable database of performance tuning cases. Therefore, this paper investigates how much the recorded expertise in the database can contribute to performance tuning for the latest generation, NEC SX-Aurora TSUBASA (SX-AT). Since the system architecture as well as the software stack such as compilers are totally renewed for SX-AT, this paper discusses the differences in performance tuning across system generations. In addition, this paper also discusses how to express performance tuning techniques in a machine-usable way. The case study in this paper indicates that the Xevolver's approach of using user-defined code transformations can express most of the vectorization-aware performance tuning techniques, and is thus promising for recording the performance tuning expertise in a future-proof fashion.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"46 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134420045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
UI Method to Support Knowledge Creation in Hybrid Museum Experience 支持混合博物馆体验中知识创造的UI方法
Toru Tamahashi, R. Yoshioka, Takayuki Hoshino
{"title":"UI Method to Support Knowledge Creation in Hybrid Museum Experience","authors":"Toru Tamahashi, R. Yoshioka, Takayuki Hoshino","doi":"10.1109/MCSoC51149.2021.00050","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00050","url":null,"abstract":"A user interaction method to support knowledge creation in a hybrid museum experience is proposed and evaluated. The method incorporates a knowledge creation process of visitor experiences to the interaction scheme on the user interface based on two intentions. The first intention is to invoke user actions required for an effective knowledge experience, including individual learning. The second intention is to document the knowledge with sufficient information for sharing and reuse. The method is designed as part of an application for a hybrid museum experience such that the digital device does not distract the visitor from the museum exhibit. This paper presents the proposed UI interaction method, its implementation in the application, and an evaluation study of its effects. The evaluation was conducted with a group of curators to obtain professional feedback on the method's effect on observation behavior and knowledge creation. As a result, we found that the user interface of expressing one's own impressions and seeing the impressions of others helped to deepen the understanding of the exhibits.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"196 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114867350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Performance Estimation of High-Level Dataflow Program on Heterogeneous Platforms 异构平台上高级数据流程序的性能评估
Aurelien Bloch, S. Brunet, M. Mattavelli
{"title":"Performance Estimation of High-Level Dataflow Program on Heterogeneous Platforms","authors":"Aurelien Bloch, S. Brunet, M. Mattavelli","doi":"10.1109/MCSoC51149.2021.00018","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00018","url":null,"abstract":"The performance of programs written in languages following the dataflow model of computation (MoC) largely depends on the configuration (partitioning, mapping, scheduling, buffer dimensioning) chosen during the synthesis stages. Furthermore, this programming paradigm is particularly well suited for heterogeneous parallel systems because it is inherently free of memory contention and exposes parallel opportunities. Both of these statements show the necessity for a way to easily and automatically evaluate and find good design configurations. The paper describes the methodology required for clock-accurate profiling of high-level dataflow programs written in RVL-CAL when synthesized on heterogeneous CPU/GPU co-processing platforms. It also extends to the heterogeneous paradigm an existing methodology for qualitatively estimating the performance of such programs as a function of the provided configuration. This, without the need to synthesize and profile every single configuration on the actual hardware platform. This approach is validated using two application programs and several configurations.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"124366462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A Memory-Access-Minimized BCNN Accelerator Using Nonvolatile FPGA with Only-Once- Write Shifting 基于非易失FPGA的单写移位最小化内存访问BCNN加速器
D. Suzuki, Takahiro Oka, T. Hanyu
{"title":"A Memory-Access-Minimized BCNN Accelerator Using Nonvolatile FPGA with Only-Once- Write Shifting","authors":"D. Suzuki, Takahiro Oka, T. Hanyu","doi":"10.1109/MCSoC51149.2021.00021","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00021","url":null,"abstract":"A binary convolutional neural network (BCNN) accelerator using a nonvolatile field-programmable gate array (NV-FPGA) with only-once-write shifting is presented. During the basic operation of the BCNN, the feature maps and weights are read from the block RAM (BRAM) and serially transferred to processing elements. The use of only-once-write shifting makes it possible to greatly reduce write power consumption such serial data transfer in the NV-FPGA. Meanwhile, since the BCNN computing is composed of the nested loop, the memory access potentially has a temporal locality. This means that once the data is read from the BRAM, it can be reused among several layers. By focusing this feature and performing loop interchange, the number of memory access can be minimized and the idle time is maximized. If the BRAM is nonvolatile, wasted standby energy consumption during idle state is completely eliminated by the use of power gating technique. As a result, the proposed BCNN accelerator is 66.5% lower energy consumption than a conventional volatile-FPGA-based BCNN accelerator in typical digit recognition task with MNIST dataset.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127967456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Accelerated on-Chip Algorithm Based on Semantic Region-Based Partial Difference Detection for LiDAR-Vision Depth Data Transmission Reduction in Lightweight Controller Systems of Autonomous Vehicle 基于语义区域偏差分检测的加速片上算法在自动驾驶汽车轻量化控制器系统中减少激光雷达视觉深度数据传输
Dong-gill Jung, Dae-Geun Park
{"title":"Accelerated on-Chip Algorithm Based on Semantic Region-Based Partial Difference Detection for LiDAR-Vision Depth Data Transmission Reduction in Lightweight Controller Systems of Autonomous Vehicle","authors":"Dong-gill Jung, Dae-Geun Park","doi":"10.1109/MCSoC51149.2021.00011","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00011","url":null,"abstract":"LiDAR sensors are one type of sensor used in autonomous driving vehicles that obtain distance data through the flight time of light. A LiDAR sensor can measure data at high speeds, and the precision of the data is higher than with other sensors. A large amount of data per sensing time is transmitted from sensors. Autonomous driving vehicles use man electronic devices, so the data channels they use and the domain control unit resources that control the system are limited. In this environment, if LiDAR sensor data can be reduced without compromising the original data, it can have a quite positive impact on autonomous vehicle systems. In this paper, we propose a differential partial update for data reduction of LiDAR sensors and a semantic detection to eliminate the resulting noise and increase the reliability of the data. The sensor processor extracts only the changed parts of the continuous distance data, excluding the same parts, and transmit them to the host. The high-difference noise is eliminated by filtering through a window-sliding operation. Semantic detection marks only parts that change and detects movement in the field of view. Simple differential partial updates reduce the amount of data by 59.31% based on a simple case. A semantic detection partial update can reduce the amount of data by 83.41%. This process can also reduce computing time by 61.36% with graphics processing unit acceleration.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"34 5 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"132761948","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Text Compression Based on an Alternative Approach of Run-Length Coding Using Burrows-Wheeler Transform and Arithmetic Coding 基于Burrows-Wheeler变换和算术编码的游程编码替代方法的文本压缩
Md.Atiqur Rahman, Mohamed Hamada, Md. Asfaqur Rahman
{"title":"Text Compression Based on an Alternative Approach of Run-Length Coding Using Burrows-Wheeler Transform and Arithmetic Coding","authors":"Md.Atiqur Rahman, Mohamed Hamada, Md. Asfaqur Rahman","doi":"10.1109/MCSoC51149.2021.00049","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00049","url":null,"abstract":"In modern life, communication via text is becoming one of the most popular means of communication. As a result, storing text in a small format or transferring it quickly over the internet has become a challenging issue, and text compression has become an important research field. Many algorithms for text compression have already been developed, and new algorithms are being devised to fulfil the demands of current technology. This research article proposes a text compression technique based on: (i) the Burrows-Wheeler transform; (ii) an alternative method of run-length coding; (iii) finding repeated patterns more frequently; and (iv) arithmetic coding. The proposed approach is compared with other state-of-the-art methods, and gives better performance in terms of compression ratios.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"27 3","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"133686249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Data Fusion Driven Lane-level Precision Data Transmission for V2X Road Applications 数据融合驱动车道级精确数据传输的V2X道路应用
Albert Budi Christian, Chih-Yu Lin, Lan-Da Van, Y. Tseng
{"title":"Data Fusion Driven Lane-level Precision Data Transmission for V2X Road Applications","authors":"Albert Budi Christian, Chih-Yu Lin, Lan-Da Van, Y. Tseng","doi":"10.1109/MCSoC51149.2021.00031","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00031","url":null,"abstract":"Inter-vehicle communication is being developed continuously in order to accomplish a better driving experience. Through the exchange of information between vehicles and Road Side Unit (RSU), number of accidents can be reduced by notifying the driver through the facts obtained. In general, broadcast information for vehicles is sent in an ad hoc manner. However, unfiltered information may be useless and wasted for most vehicles. Thus, a raised question is whether precise information can be delivered only to the target vehicles without interfering with other non-target vehicles. A computer vision (CV) and sensor fusion-based transmission system are exchanged by RSU and Vehicle On-board Unit (OBU) is developed to attain this objective. In order to correctly transmit the specific information to the target vehicles, we propose a data fusion driven lane-level precision data transmission system that utilizes three kinds of sensory inputs: Road Side Camera (RSC), GPS, and magnetometer. By combining common features from these sensory inputs, our system is able to select the receiver of specific information on the road. Our system focuses on the scenario where a message can be transmitted to the target vehicles located in a certain lane. The experimental evaluation shows a recognition rate of 87.34% and the generated messages have a total delay less than 72 ms.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"131081192","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Variable Bit-Precision Vector Extension for RISC-V Based Processors 基于RISC-V处理器的可变位精度矢量扩展
RK Risikesh, Sharad Sinha, N. Rao
{"title":"Variable Bit-Precision Vector Extension for RISC-V Based Processors","authors":"RK Risikesh, Sharad Sinha, N. Rao","doi":"10.1109/MCSoC51149.2021.00024","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00024","url":null,"abstract":"Neural Network model execution is becoming an increasingly compute intensive task. With advances in optimisation techniques such as using lower-bit width precision, need for quantization and model compression, we need to find efficient ways of implementing these techniques. Most Instruction Set Architectures(ISA) do not support low bit-width vector instructions. In this work, we present an extension for the vector specification of the RISC-V ISA, which is targeted towards supporting the lower bit-widths or variable precision (1 to 16 bits) Multiply and Accumulate (MAC) operations. We demonstrate our proposed ISA extension by integrating it with a RISC-V processor named PicoRV32, which is considered as the baseline processor in the proposed work. We introduce the feature of bit-serial multiplication along with variable bit precision support to demonstrate the advantage over a 16 bit baseline processor model. We also build an assembler for the proposed instructions for easier integration into the testbench of the RTL model. We implement the processor on to a Xilinx Zynq based FPGA. We observe that, compared to the baseline RISC-V Vector processor which only supports 8, 16 and 32-bit vector instructions, our processor with variable precision support (1 to16 bits) performs 1.14x faster on an average on a matrix multiplication test program. The proposed processor architecture reduces the memory footprint by up to 1.88x as compared with a baseline 16-bit vector processor.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"14 1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128645511","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parasitic-Aware Modelling for Neural Networks Implemented with Memristor Crossbar Array 忆阻交叉栅阵列神经网络的寄生感知建模
T. Cao, Chen Liu, Yuan Gao, W. Goh
{"title":"Parasitic-Aware Modelling for Neural Networks Implemented with Memristor Crossbar Array","authors":"T. Cao, Chen Liu, Yuan Gao, W. Goh","doi":"10.1109/MCSoC51149.2021.00025","DOIUrl":"https://doi.org/10.1109/MCSoC51149.2021.00025","url":null,"abstract":"This paper presents a parasitic-aware modelling approach called αβ-matrix model for the simulation of neural network (NN) implemented with memristor crossbar array. The line resistance, which is the key parasitic in a memristor crossbar array is analyzed and incorporated into the model. The proposed method estimates the line resistance IR drop with computation complexity of O(mn), in contrast to O(m2n2) required by the classical matrix based Kirchhoff's Current Law (KCL) equations solver. The impact of the crossbar array parasitics to the vector-matrix multiplication (VMM) computation and multi-layer NN classification accuracy are also analyzed. The advantages of the proposed parasitic-aware model are demonstrated through an example of 2-layer perceptron implemented with resistive random access memory (RRAM) crossbar array for MNIST written digits classification. 97.3% classification accuracy is achieved on 64×64 6-bit RRAM crossbar arrays. Compared to the KCL solver, the classification accuracy degradation is less than 0.4% with line resistance up to 4.5Ω.","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"13 6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116216236","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
[Title page] (标题页)
{"title":"[Title page]","authors":"","doi":"10.1109/mcsoc51149.2021.00002","DOIUrl":"https://doi.org/10.1109/mcsoc51149.2021.00002","url":null,"abstract":"","PeriodicalId":166811,"journal":{"name":"2021 IEEE 14th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123745262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信