IPSJ Transactions on System LSI Design Methodology最新文献_第3页

Shift Register Initialization in Scalar Replacement for Reducing Code Size 减少代码大小的标量替换中的移位寄存器初始化

IPSJ Transactions on System LSI Design Methodology Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.2

Kenshu Seto

{"title":"Shift Register Initialization in Scalar Replacement for Reducing Code Size","authors":"Kenshu Seto","doi":"10.2197/ipsjtsldm.13.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.2","url":null,"abstract":": Scalar replacement is an e ﬀ ective technique to improve the performance of the RTL code generated by high-level synthesis (HLS) from C programs with intensive array accesses. In scalar replacement, data accessed from arrays are stored into shift registers, and later array accesses on the same data are replaced with the accesses to the shift registers instead of the arrays. Namely, scalar replacement replaces array accesses with shift register accesses. Since arrays in C programs are usually mapped to RAMs with limited numbers of ports, reducing array accesses with scalar replacement leads to the memory access reduction, which in turn improves the performance of the resulting RTL code. In real-life C programs, sometimes, shift registers must be initialized conditionally using multiple array accesses, which increases the number of array accesses in main loops. To reduce the conditional array access in the main loops, the previous scalar replacement method proposed the use of a loop transformation called loop peeling. Loop peeling brings signiﬁcant increase in code size, leading to the negative impacts on performance or circuit area of the synthesized hardware. In this paper, we propose a new method to initialize shift registers without loop peeling. The proposed method works as a preprocessing of the input C program prior to scalar replacement. With experimental results, we demonstrate the proposed method reduces the numbers of execution cycles of the synthesized hardware compared to the previous method.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"105 1","pages":"2-9"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79279120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

A Logic Optimization Method by Eliminating Redundant Multiple Faults from Higher to Lower Cardinality 一种从高到低基数消除冗余多故障的逻辑优化方法

IPSJ Transactions on System LSI Design Methodology Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.35

P. Wang, A. M. Gharehbaghi, M. Fujita

引用次数: 0

Real Circuit Delay Measurement Method by Variable Frequency Operation with On-Chip Fine Resolution Oscillator 片上小分辨率振荡器变频操作实电路时延测量方法

IPSJ Transactions on System LSI Design Methodology Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.21

K. Shimamura, Naohiro Ikeda

{"title":"Real Circuit Delay Measurement Method by Variable Frequency Operation with On-Chip Fine Resolution Oscillator","authors":"K. Shimamura, Naohiro Ikeda","doi":"10.2197/ipsjtsldm.13.21","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.21","url":null,"abstract":"With the progress of semiconductor process miniaturization, delay degradation by aging increases and threatens the reliability of fabricated chips. The amount of delay degradation is known to be circuit and workload dependent, but previous evaluations are based on simulations, and delay degradation measurement of real circuit under realistic workload has not been reported yet. This paper proposes real circuit delay measurement method, which achieves enough accuracy to measure circuit and workload dependent delay degradation. In the proposed method, onchip oscillator supplies fine resolution variable frequency clock to internal circuit. Internal circuit execute test pattern to activate critical paths at various frequency and determine the maximum frequency at which correct results can be obtained. The maximum frequency corresponds to the delay of the critical paths activated by the test pattern. Clock multiplication improves delay resolution, and repetitive measurement reduces measurement error caused by time dependent random delay variation. The proposed method has been implemented on a 65 nm low power process test chip. Variable frequency oscillator utilizes only standard cells and is designed with automatic layout flow without any timing tuning. The area overhead of the proposed method is 0.09% of the total random logic. The evaluation result show that 0.18% average measurement accuracy has been achieved.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"149 1","pages":"21-30"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73915270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

An FPGA Implementation Method based on Distributed-register Architectures 一种基于分布式寄存器结构的FPGA实现方法

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-02-01 DOI: 10.2197/ipsjtsldm.12.38

Koichi Fujiwara, Kazushi Kawamura, M. Yanagisawa, N. Togawa

引用次数: 0

Circuit Techniques for Device-Circuit Interaction toward Minimum Energy Operation 面向最小能量运行的器件-电路相互作用电路技术

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-02-01 DOI: 10.2197/ipsjtsldm.12.2

A. Islam, H. Onodera

引用次数: 3

Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA 基于FPGA的稀疏卷积神经网络并行柔性卷积核

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.22

Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi

引用次数: 4

Scalar Replacement with Circular Buffers 用循环缓冲区替换标量

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.13

Kenshu Seto

{"title":"Scalar Replacement with Circular Buffers","authors":"Kenshu Seto","doi":"10.2197/ipsjtsldm.12.13","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.13","url":null,"abstract":"Scalar replacement is one of effective array access optimizations that can be applied before High-level synthesis (HLS). The successful application of scalar replacement removes local memories, and as a result, it decreases hardware area. In addition, scalar replacement reduces the numbers of hardware execution cycles by reducing memory access conflicts. In scalar replacement, shift registers are introduced to remove local arrays, and reuse distances corresponds to the lengths of the shift registers. Previous scalar replacement methods implement the shift registers with chains of registers, so that the hardware area becomes large when the reuse distances are large. In addition, when reuse distances are unknown at compile time, previous scalar replacement methods require multiplexers with large numbers of inputs, which further increase on hardware area. In this paper, we propose a new technique to resolve the issues. In particular, we implement the shift registers with circular buffers instead of chains of registers. Large shift registers implemented by RAM-based circular buffers are more compact than those implemented by the chains of registers. We also show that the proposed method requires no multiplexers to realize scalar replacement for loops with statically unknown reuse distances, which leads to area-efficient hardware implementation. We developed a tool that implements the method and applied the tool to the benchmark programs which require large shift registers or have statically unknown reuse distances. We found that the hardware area is reduced with the proposed method compared to the previous method without sacrificing the hardware performance. We conclude that the proposed method is an area efficient scalar replacement method for programs that have large or unknown reuse distances at compile time.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"89 1","pages":"13-21"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82540316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

An OpenCL-based Software Framework for a Heterogeneous Multicore Architecture on Zynq-7000 SoC 基于opencl的Zynq-7000 SoC异构多核架构软件框架

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.46

T. Miyazaki, Shunsuke Takai, Ittetsu Taniguchi, H. Tomiyama

引用次数: 1

Neuromorphic Computing Systems: From CMOS To Emerging Nonvolatile Memory 神经形态计算系统:从CMOS到新兴的非易失性存储器

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.53

Chaofei Yang, Ximing Qiao, Yiran Chen

{"title":"Neuromorphic Computing Systems: From CMOS To Emerging Nonvolatile Memory","authors":"Chaofei Yang, Ximing Qiao, Yiran Chen","doi":"10.2197/ipsjtsldm.12.53","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.53","url":null,"abstract":": The end of Moore’s Law and von Neumann bottleneck motivate researchers to seek alternative architec- tures that can fulﬁll the increasing demand for computation resources which cannot be easily achieved by traditional computing paradigm. As one important practice, neuromorphic computing systems (NCS) are proposed to mimic bi- ological behaviors of neurons and synapses, and accelerate computation of neural networks. Traditional CMOS-based implementation of NCS, however, are subject to large hardware cost required to precisely replicate the biological prop- erties. In very recent decade, emerging nonvolatile memory (eNVM) was introduced to NCS design due to its high computing e ﬃ ciency and integration density. Similar to the circuits built on other nanoscale devices, eNVM-based NCS also su ﬀ ers from many reliability issues. In this paper, we give a short survey about CMOS- and eNVM-based NCS, including their basic implementations and training and inference schemes in various applications. We also dis- cuss the design challenges of these NCS and introduce some techniques that can improve the reliability, precision, scalability, and security of the NCS. At the end, we provide our insights on the design trend and future challenges of the NCS.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"94 1","pages":"53-64"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84303031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Genetic Algorithm for Scheduling of Data-parallel Tasks on Multicore Architectures 多核架构下数据并行任务调度的遗传算法

IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.74

Yang Liu, Lin Meng, H. Tomiyama

引用次数: 4