IPSJ Transactions on System LSI Design Methodology最新文献

筛选
英文 中文
Shift Register Initialization in Scalar Replacement for Reducing Code Size 减少代码大小的标量替换中的移位寄存器初始化
IPSJ Transactions on System LSI Design Methodology Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.2
Kenshu Seto
{"title":"Shift Register Initialization in Scalar Replacement for Reducing Code Size","authors":"Kenshu Seto","doi":"10.2197/ipsjtsldm.13.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.2","url":null,"abstract":": Scalar replacement is an e ff ective technique to improve the performance of the RTL code generated by high-level synthesis (HLS) from C programs with intensive array accesses. In scalar replacement, data accessed from arrays are stored into shift registers, and later array accesses on the same data are replaced with the accesses to the shift registers instead of the arrays. Namely, scalar replacement replaces array accesses with shift register accesses. Since arrays in C programs are usually mapped to RAMs with limited numbers of ports, reducing array accesses with scalar replacement leads to the memory access reduction, which in turn improves the performance of the resulting RTL code. In real-life C programs, sometimes, shift registers must be initialized conditionally using multiple array accesses, which increases the number of array accesses in main loops. To reduce the conditional array access in the main loops, the previous scalar replacement method proposed the use of a loop transformation called loop peeling. Loop peeling brings significant increase in code size, leading to the negative impacts on performance or circuit area of the synthesized hardware. In this paper, we propose a new method to initialize shift registers without loop peeling. The proposed method works as a preprocessing of the input C program prior to scalar replacement. With experimental results, we demonstrate the proposed method reduces the numbers of execution cycles of the synthesized hardware compared to the previous method.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"105 1","pages":"2-9"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"79279120","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
A Logic Optimization Method by Eliminating Redundant Multiple Faults from Higher to Lower Cardinality 一种从高到低基数消除冗余多故障的逻辑优化方法
IPSJ Transactions on System LSI Design Methodology Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.35
P. Wang, A. M. Gharehbaghi, M. Fujita
{"title":"A Logic Optimization Method by Eliminating Redundant Multiple Faults from Higher to Lower Cardinality","authors":"P. Wang, A. M. Gharehbaghi, M. Fujita","doi":"10.2197/ipsjtsldm.13.35","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.35","url":null,"abstract":": In this paper, we propose a logic optimization method to remove the redundancy in the circuit. The incre- mental Automatic Test Pattern Generation method is used to find the redundant multiple faults. In order to remove as many redundancies as possible, instead of removing the redundant single faults first, we clear up the redundant faults from higher cardinality to lower cardinality. The experiments prove that the proposed method can successfully eliminate more redundancies comparing to the redundancy removal command in the synthesis tool SIS.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"46 1","pages":"35-38"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"81283440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Real Circuit Delay Measurement Method by Variable Frequency Operation with On-Chip Fine Resolution Oscillator 片上小分辨率振荡器变频操作实电路时延测量方法
IPSJ Transactions on System LSI Design Methodology Pub Date : 2020-01-01 DOI: 10.2197/ipsjtsldm.13.21
K. Shimamura, Naohiro Ikeda
{"title":"Real Circuit Delay Measurement Method by Variable Frequency Operation with On-Chip Fine Resolution Oscillator","authors":"K. Shimamura, Naohiro Ikeda","doi":"10.2197/ipsjtsldm.13.21","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.13.21","url":null,"abstract":"With the progress of semiconductor process miniaturization, delay degradation by aging increases and threatens the reliability of fabricated chips. The amount of delay degradation is known to be circuit and workload dependent, but previous evaluations are based on simulations, and delay degradation measurement of real circuit under realistic workload has not been reported yet. This paper proposes real circuit delay measurement method, which achieves enough accuracy to measure circuit and workload dependent delay degradation. In the proposed method, onchip oscillator supplies fine resolution variable frequency clock to internal circuit. Internal circuit execute test pattern to activate critical paths at various frequency and determine the maximum frequency at which correct results can be obtained. The maximum frequency corresponds to the delay of the critical paths activated by the test pattern. Clock multiplication improves delay resolution, and repetitive measurement reduces measurement error caused by time dependent random delay variation. The proposed method has been implemented on a 65 nm low power process test chip. Variable frequency oscillator utilizes only standard cells and is designed with automatic layout flow without any timing tuning. The area overhead of the proposed method is 0.09% of the total random logic. The evaluation result show that 0.18% average measurement accuracy has been achieved.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"149 1","pages":"21-30"},"PeriodicalIF":0.0,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73915270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
An FPGA Implementation Method based on Distributed-register Architectures 一种基于分布式寄存器结构的FPGA实现方法
IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-02-01 DOI: 10.2197/ipsjtsldm.12.38
Koichi Fujiwara, Kazushi Kawamura, M. Yanagisawa, N. Togawa
{"title":"An FPGA Implementation Method based on Distributed-register Architectures","authors":"Koichi Fujiwara, Kazushi Kawamura, M. Yanagisawa, N. Togawa","doi":"10.2197/ipsjtsldm.12.38","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.38","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"50 1","pages":"38-41"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86945903","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Circuit Techniques for Device-Circuit Interaction toward Minimum Energy Operation 面向最小能量运行的器件-电路相互作用电路技术
IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-02-01 DOI: 10.2197/ipsjtsldm.12.2
A. Islam, H. Onodera
{"title":"Circuit Techniques for Device-Circuit Interaction toward Minimum Energy Operation","authors":"A. Islam, H. Onodera","doi":"10.2197/ipsjtsldm.12.2","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.2","url":null,"abstract":"","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"18 1","pages":"2-12"},"PeriodicalIF":0.0,"publicationDate":"2019-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87699381","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA 基于FPGA的稀疏卷积神经网络并行柔性卷积核
IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.22
Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi
{"title":"Parallelism-flexible Convolution Core for Sparse Convolutional Neural Networks on FPGA","authors":"Salita Sombatsiri, S. Shibata, Yuki Kobayashi, Hiroaki Inoue, Takashi Takenaka, T. Hosomi, Jaehoon Yu, Yoshinori Takeuchi","doi":"10.2197/ipsjtsldm.12.22","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.22","url":null,"abstract":"This paper proposes a convolution core for sparse CNN that is capable of flexibly alternating the parallelism schemes and degree exploiting intraand inter-output parallelism of the convolutional layer, and leveraging weight sparsity using a compressed sparse model in the compressed sparse column format and output-stationary dataflow. The experimental results show that the performance is improved by 3.9 times even in the deeper layer where the conventional accelerator could not fully exploit the parallelism due to the small layer size. The proposed architecture could also exploit the weight sparsity. Then, by combining both the multi-parallelism and the weight sparsity, the proposed architecture achieved 5.2 times better performance than the conventional accelerator.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"9 1","pages":"22-37"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78625225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Scalar Replacement with Circular Buffers 用循环缓冲区替换标量
IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.13
Kenshu Seto
{"title":"Scalar Replacement with Circular Buffers","authors":"Kenshu Seto","doi":"10.2197/ipsjtsldm.12.13","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.13","url":null,"abstract":"Scalar replacement is one of effective array access optimizations that can be applied before High-level synthesis (HLS). The successful application of scalar replacement removes local memories, and as a result, it decreases hardware area. In addition, scalar replacement reduces the numbers of hardware execution cycles by reducing memory access conflicts. In scalar replacement, shift registers are introduced to remove local arrays, and reuse distances corresponds to the lengths of the shift registers. Previous scalar replacement methods implement the shift registers with chains of registers, so that the hardware area becomes large when the reuse distances are large. In addition, when reuse distances are unknown at compile time, previous scalar replacement methods require multiplexers with large numbers of inputs, which further increase on hardware area. In this paper, we propose a new technique to resolve the issues. In particular, we implement the shift registers with circular buffers instead of chains of registers. Large shift registers implemented by RAM-based circular buffers are more compact than those implemented by the chains of registers. We also show that the proposed method requires no multiplexers to realize scalar replacement for loops with statically unknown reuse distances, which leads to area-efficient hardware implementation. We developed a tool that implements the method and applied the tool to the benchmark programs which require large shift registers or have statically unknown reuse distances. We found that the hardware area is reduced with the proposed method compared to the previous method without sacrificing the hardware performance. We conclude that the proposed method is an area efficient scalar replacement method for programs that have large or unknown reuse distances at compile time.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"89 1","pages":"13-21"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82540316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
An OpenCL-based Software Framework for a Heterogeneous Multicore Architecture on Zynq-7000 SoC 基于opencl的Zynq-7000 SoC异构多核架构软件框架
IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.46
T. Miyazaki, Shunsuke Takai, Ittetsu Taniguchi, H. Tomiyama
{"title":"An OpenCL-based Software Framework for a Heterogeneous Multicore Architecture on Zynq-7000 SoC","authors":"T. Miyazaki, Shunsuke Takai, Ittetsu Taniguchi, H. Tomiyama","doi":"10.2197/ipsjtsldm.12.46","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.46","url":null,"abstract":"This paper presents an OpenCL-based software framework which we have developed for a heterogeneous multicore architecture on Zynq-7000 SoC. In this work, the heterogeneous architecture is designed with two hardmacro Cortex-A9 cores and two soft-macro MicroBlaze cores. A major advantage of our OpenCL framework is that it can execute OpenCL kernel programs in three ways. Experiments show the usefulness of the OpenCL framework.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"1 1","pages":"46-49"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88776011","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Neuromorphic Computing Systems: From CMOS To Emerging Nonvolatile Memory 神经形态计算系统:从CMOS到新兴的非易失性存储器
IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.53
Chaofei Yang, Ximing Qiao, Yiran Chen
{"title":"Neuromorphic Computing Systems: From CMOS To Emerging Nonvolatile Memory","authors":"Chaofei Yang, Ximing Qiao, Yiran Chen","doi":"10.2197/ipsjtsldm.12.53","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.53","url":null,"abstract":": The end of Moore’s Law and von Neumann bottleneck motivate researchers to seek alternative architec- tures that can fulfill the increasing demand for computation resources which cannot be easily achieved by traditional computing paradigm. As one important practice, neuromorphic computing systems (NCS) are proposed to mimic bi- ological behaviors of neurons and synapses, and accelerate computation of neural networks. Traditional CMOS-based implementation of NCS, however, are subject to large hardware cost required to precisely replicate the biological prop- erties. In very recent decade, emerging nonvolatile memory (eNVM) was introduced to NCS design due to its high computing e ffi ciency and integration density. Similar to the circuits built on other nanoscale devices, eNVM-based NCS also su ff ers from many reliability issues. In this paper, we give a short survey about CMOS- and eNVM-based NCS, including their basic implementations and training and inference schemes in various applications. We also dis- cuss the design challenges of these NCS and introduce some techniques that can improve the reliability, precision, scalability, and security of the NCS. At the end, we provide our insights on the design trend and future challenges of the NCS.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"94 1","pages":"53-64"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84303031","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Genetic Algorithm for Scheduling of Data-parallel Tasks on Multicore Architectures 多核架构下数据并行任务调度的遗传算法
IPSJ Transactions on System LSI Design Methodology Pub Date : 2019-01-01 DOI: 10.2197/ipsjtsldm.12.74
Yang Liu, Lin Meng, H. Tomiyama
{"title":"A Genetic Algorithm for Scheduling of Data-parallel Tasks on Multicore Architectures","authors":"Yang Liu, Lin Meng, H. Tomiyama","doi":"10.2197/ipsjtsldm.12.74","DOIUrl":"https://doi.org/10.2197/ipsjtsldm.12.74","url":null,"abstract":": This paper proposes a genetic algorithm for scheduling of multiple data-parallel tasks on multicores. Un- like traditional task scheduling, this work allows individual tasks to run on multiple cores in a data-parallel fashion. Experimental results show the e ff ectiveness of the proposed algorithm over state-of-the-art","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"11 1","pages":"74-77"},"PeriodicalIF":0.0,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83670586","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信