IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
A Hybrid Domain and Pipelined Analog Computing Chain for MVM Computation 一种用于MVM计算的混合域和流水线模拟计算链
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-26 DOI: 10.1109/TVLSI.2024.3439355
Tianzhu Xiong;Yuyang Ye;Xin Si;Jun Yang
{"title":"A Hybrid Domain and Pipelined Analog Computing Chain for MVM Computation","authors":"Tianzhu Xiong;Yuyang Ye;Xin Si;Jun Yang","doi":"10.1109/TVLSI.2024.3439355","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3439355","url":null,"abstract":"In this article, a stream-architecture and pipelined hybrid computing chain is presented to process matrix-vector multiplication (MVM). In each stage of the computing chain, a primary multiply-accumulate (MAC) stage consisting of charge, time, and digital domain processing units makes signed or unsigned \u0000<inline-formula> <tex-math>$8times 1times 8$ </tex-math></inline-formula>\u0000 bit MAC operations and MSB quantization. Based on the stream architecture, the length of the computing chain can be configured to fit different MVM applications. In the charge-domain MAC unit, a double-plate sampling and weighted capacitor array with writing yield and efficiency enhanced 7T bitcell and three-step weighting scheme is implemented. To utilize the speed and resolution advantages of time-domain computing, a high linearity voltage-to-time converter (VTC) followed by a dynamic tristate delay chain is proposed to transfer and store MAC values from the charge domain in the time domain. To realize fast analog readout, a folding type and distributed time-to-digital converter (TDC) is proposed. To fully eliminate the offset and variation in the distributed TDC, a specific residue readout timing and back-end calibration scheme are applied. In the digital domain, a double-input and double-clock dynamic D flip-flop is built to realize partial sum transmission and accumulation in a single cycle with low energy and area consumption. Post-simulation results show that this computing chain can achieve 20.89–40.72-TOPS/W energy efficiency and 4.498-TOPS/mm2 throughput.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"52-65"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast-Convergence Near-Memory-Computing Accelerator for Solving Partial Differential Equations 求解偏微分方程的快速收敛近内存计算加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-26 DOI: 10.1109/TVLSI.2024.3458801
Chenjia Xie;Zhuang Shao;Ning Zhao;Xingyuan Hu;Yuan Du;Li Du
{"title":"A Fast-Convergence Near-Memory-Computing Accelerator for Solving Partial Differential Equations","authors":"Chenjia Xie;Zhuang Shao;Ning Zhao;Xingyuan Hu;Yuan Du;Li Du","doi":"10.1109/TVLSI.2024.3458801","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3458801","url":null,"abstract":"Solving partial differential equations (PDEs) is omnipresent in scientific research and engineering and requires expensive numerical iteration for memory and computation. The primary concerns for solving PDEs are convergence speed, data movement, and power consumption. This work proposed the first fast-convergence PDE solver with an automatic adjustment multiple-stride iteration method, significantly increasing the PDE convergence speed. A dynamic-precision near-memory-computing architecture with booth encoding is proposed to reduce iterated intermediate data movement. A customized 32T compressor and a 14T full adder are designed to reduce the power and hardware cost of the solver. The processor is fabricated using 65-nm CMOS technology and occupies a 6.25 mm2 die area. It can achieve a convergence speedup by <inline-formula> <tex-math>$4times $ </tex-math></inline-formula> compared with the existing work.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"578-582"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992865","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE 超大规模集成 (VLSI) 系统论文集 出版信息
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-26 DOI: 10.1109/TVLSI.2024.3457191
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2024.3457191","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3457191","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695157","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142324285","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hardware–Algorithm Codesigned Low-Latency and Resource-Efficient OMP Accelerator for DOA Estimation on FPGA 基于FPGA的低延迟、资源高效的OMP加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-26 DOI: 10.1109/TVLSI.2024.3462467
Ruichang Jiang;Wenbin Ye
{"title":"Hardware–Algorithm Codesigned Low-Latency and Resource-Efficient OMP Accelerator for DOA Estimation on FPGA","authors":"Ruichang Jiang;Wenbin Ye","doi":"10.1109/TVLSI.2024.3462467","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3462467","url":null,"abstract":"This article introduces an algorithm-hardware codesign optimized for low-latency and resource-efficient direction-of-arrival (DOA) estimation, employing a refined orthogonal matching pursuit (OMP) algorithm adept at handling the complexities of multisource detection, particularly in scenarios with closely spaced signal sources. At the algorithmic level, this approach incorporates a secondary correction mechanism (SCM) into the traditional OMP algorithm, significantly improving estimation accuracy and robustness. On the hardware front, a bespoke OMP accelerator has been developed, featuring a reconfigurable generic processing element (PE) array that supports various computational modes and leverages multilevel spectral peak search strategy and pipelining techniques to enhance computational efficiency. Experimental evaluations reveal that the proposed system achieves a root mean square error (RMSE) for DOA estimation of less than 0.3° in multisource conditions with a signal-to-noise ratio (SNR) of 20 dB. In addition, the deployment of the OMP accelerator on a Zynq XC7Z020 development board utilizes modest logic resources: 5.49k LUTs, 3.28k FFs, 11.5 BRAMs, and 32 DSPs. Furthermore, the design achieves a computational latency of <inline-formula> <tex-math>$2.83~mu text { s}$ </tex-math></inline-formula> for single-source estimation with eight antennas. This achievement reflects a reduction of approximately 17.8% in LUTs, 56.3% in FFs, and 5.7% in DSPs compared to current leading-edge technologies after normalization all while maintaining competitive estimation accuracy and favorable estimation rates.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"421-434"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992938","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MBSNTT: A Highly Parallel Digital In-Memory Bit-Serial Number Theoretic Transform Accelerator MBSNTT:一个高度并行的数字内存位-序列号理论转换加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-26 DOI: 10.1109/TVLSI.2024.3462955
Akhil Pakala;Zhiyu Chen;Kaiyuan Yang
{"title":"MBSNTT: A Highly Parallel Digital In-Memory Bit-Serial Number Theoretic Transform Accelerator","authors":"Akhil Pakala;Zhiyu Chen;Kaiyuan Yang","doi":"10.1109/TVLSI.2024.3462955","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3462955","url":null,"abstract":"Conventional cryptographic systems protect the data security during communication but give third-party cloud operators complete access to compute decrypted user data. Homomorphic encryption (HE) promises to rectify this and allow computations on encrypted data to be done without actually decrypting it. However, HE encryption requires several orders of magnitude higher latency than conventional encryption schemes. Number theoretic transform (NTT), a polynomial multiplication algorithm, is the bottleneck function in HE. In traditional architectures, memory accesses and support for parallel operations limit NTT’s throughput and energy efficiency. Processing in memory (PIM) is an interesting approach that can maximize parallelism with high-energy efficiency. To enable HE on resource-constrained edge devices, this article presents MBSNTT, a digital in-memory Multi-Bit-Serial NTT accelerator, achieving high parallelism and energy efficiency for NTT with minimized area. MBSNTT features a novel multi-bit-serial modular multiplication algorithm and PIM implementation that computes all modular multiplications in an NTT in parallel. It further adopts a constant geometry NTT data flow for efficient transition between NTT stages and different cores. Our evaluation shows that MBSNTT achieves <inline-formula> <tex-math>$1.62times $ </tex-math></inline-formula> (<inline-formula> <tex-math>$19.08times $ </tex-math></inline-formula>) higher throughput and <inline-formula> <tex-math>$64.9times $ </tex-math></inline-formula> (<inline-formula> <tex-math>$2.06times $ </tex-math></inline-formula>) lower energy than state-of-the-art PIM NTT accelerators Crypto-PIM (MeNTT), at a polynomial order of 8 K and bit width of 128.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"537-545"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993409","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 电气和电子工程师学会超大规模集成 (VLSI) 系统学会论文集信息
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-26 DOI: 10.1109/TVLSI.2024.3457193
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2024.3457193","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3457193","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 10","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695474","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142324368","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hi-NeRF: A Multicore NeRF Accelerator With Hierarchical Empty Space Skipping for Edge 3-D Rendering Hi-NeRF:为边缘三维渲染设计的多核 NeRF 加速器与分层空跳频
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-24 DOI: 10.1109/TVLSI.2024.3458032
Lizhou Wu;Haozhe Zhu;Jiapei Zheng;Mengjie Li;Yinuo Cheng;Qi Liu;Xiaoyang Zeng;Chixiao Chen
{"title":"Hi-NeRF: A Multicore NeRF Accelerator With Hierarchical Empty Space Skipping for Edge 3-D Rendering","authors":"Lizhou Wu;Haozhe Zhu;Jiapei Zheng;Mengjie Li;Yinuo Cheng;Qi Liu;Xiaoyang Zeng;Chixiao Chen","doi":"10.1109/TVLSI.2024.3458032","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3458032","url":null,"abstract":"Neural radiance field (NeRF) has proved to be promising in augmented/virtual-reality applications. However, the deployment of NeRF on edge devices suffers from inadequate throughput due to redundant ray sampling and congested memory access. To address these challenges, this article proposes Hi-NeRF, a multirendering-core accelerator for efficient edge NeRF rendering. On the architecture level, a hierarchical empty space skipping (HESS) scheme is adopted, which efficiently locates the effective samples with fewer skipping steps and thus accelerates the ray marching process. Furthermore, to alleviate the memory access bottleneck, a vertex-interleaved mapping (VIM) method that eliminates memory bank conflicts is also proposed. On the hardware level, ineffective sample filters (ISFs) and voxel access filters (VCFs) are introduced to further exploit spatial sparsity and data locality at run-time. The experimental results show that our work achieves \u0000<inline-formula> <tex-math>$2.67times $ </tex-math></inline-formula>\u0000 rendering throughput and \u0000<inline-formula> <tex-math>$11.2times $ </tex-math></inline-formula>\u0000 energy efficiency compared to a SOTA NeRF rendering accelerator. The energy efficiency can be improved by \u0000<inline-formula> <tex-math>$561times $ </tex-math></inline-formula>\u0000 compared to a commercial GPU.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2315-2326"},"PeriodicalIF":2.8,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821184","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast Transient Response Distributed Power Supply With Dynamic Output Switching for Power Side-Channel Attack Mitigation 基于动态输出开关的快速瞬态响应分布式电源侧信道攻击缓解
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-24 DOI: 10.1109/TVLSI.2024.3433429
Xingye Liu;Paul Ampadu
{"title":"A Fast Transient Response Distributed Power Supply With Dynamic Output Switching for Power Side-Channel Attack Mitigation","authors":"Xingye Liu;Paul Ampadu","doi":"10.1109/TVLSI.2024.3433429","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3433429","url":null,"abstract":"We present a distributed power supply and explore its load transient response and power side-channel security improvements. Typically, countermeasures against power side-channel attacks (PSCAs) are based on specialized dc/dc converters, resulting in large power and area overheads and they are difficult to scale. Moreover, due to limited output voltage range and load regulation, it is not feasible to directly distribute these converters in multicore applications. Targeting those issues, our proposed converter is designed to provide multiple fast-responding voltages and use shared circuits to mitigate PSCAs. The proposed three-output dc/dc converter can deliver 0.33–0.92 V with up to 1 A to each load. Comparing with state-of-the-art power management works, our converter has \u0000<inline-formula> <tex-math>$2times $ </tex-math></inline-formula>\u0000 load step response speed and \u0000<inline-formula> <tex-math>$4times $ </tex-math></inline-formula>\u0000 reference voltage tracking speed. Furthermore, the converter requires \u0000<inline-formula> <tex-math>$9times $ </tex-math></inline-formula>\u0000 less inductance and \u0000<inline-formula> <tex-math>$3times $ </tex-math></inline-formula>\u0000 less output capacitance. In terms of PSCA mitigation, this converter reduces the correlation between input power trace and encryption load current by \u0000<inline-formula> <tex-math>$107times $ </tex-math></inline-formula>\u0000, which is \u0000<inline-formula> <tex-math>$3times $ </tex-math></inline-formula>\u0000 better than the best standalone work, and it only induces 1.7% area overhead and 2.5% power overhead. The proposed work also increases minimum traces to disclose (MTDs) by \u0000<inline-formula> <tex-math>$1250times $ </tex-math></inline-formula>\u0000. Considering all the above, our work could be a great candidate to be employed in future multicore systems supplying varying voltages and resisting side-channel attacks. It is the first work bridging the gap between on-chip power management and side-channel security.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"261-274"},"PeriodicalIF":2.8,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Robust Hardware Trojan Detection Method by Unsupervised Learning of Electromagnetic Signals 通过电磁信号无监督学习的鲁棒硬件木马检测方法
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-24 DOI: 10.1109/TVLSI.2024.3458892
Daehyeon Lee;Junghee Lee;Younggiu Jung;Janghyuk Kauh;Taigon Song
{"title":"Robust Hardware Trojan Detection Method by Unsupervised Learning of Electromagnetic Signals","authors":"Daehyeon Lee;Junghee Lee;Younggiu Jung;Janghyuk Kauh;Taigon Song","doi":"10.1109/TVLSI.2024.3458892","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3458892","url":null,"abstract":"This article explores the threat posed by Hardware Trojans (HTs), malicious circuits clandestinely embedded in hardware akin to software backdoors. Activation by attackers renders these Trojans capable of inducing malfunctions or leaking confidential information by manipulating the hardware’s normal operation. Despite robust software security, detecting and ensuring normal hardware operation becomes challenging in the presence of malicious circuits. This issue is particularly acute in weapon systems, where HTs can present a significant threat, potentially leading to immediate disablement in adversary countries. Given the severe risks associated with HTs, detection becomes imperative. The study focuses on demonstrating the efficacy of deep learning-based HT detection by comparing and analyzing methods using deep learning with existing approaches. This article proposes utilizing the deep support vector data description (Deep SVDD) model for HT detection. The proposed method outperforms existing methods when detecting untrained HTs. It achieves 92.87% of accuracy on average, which is higher than that of an existing method, 50.00%. This finding contributes valuable insights to the field of hardware security and lays the foundation for practical applications of Deep SVDD in real-world scenarios.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2327-2340"},"PeriodicalIF":2.8,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10689630","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821255","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Bitstream Database-Driven FPGA Programming Flow Based on Standard OpenCL 基于标准 OpenCL 的位流数据库驱动 FPGA 编程流程
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-24 DOI: 10.1109/TVLSI.2024.3458062
Topi Leppänen;Leevi Leppänen;Joonas Multanen;Pekka Jääskeläinen
{"title":"Bitstream Database-Driven FPGA Programming Flow Based on Standard OpenCL","authors":"Topi Leppänen;Leevi Leppänen;Joonas Multanen;Pekka Jääskeläinen","doi":"10.1109/TVLSI.2024.3458062","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3458062","url":null,"abstract":"Field-programmable gate array (FPGA) vendors provide high-level synthesis (HLS) compilers with accompanying OpenCL runtimes to enable easier use of their devices by non-hardware experts. However, the current runtimes provided by the vendors are not OpenCL-compliant, limiting the application portability and making it difficult to integrate FPGA devices in heterogeneous computing platforms. We propose an automated FPGA management tool AFOCL, with a guiding principle that the software programmer should only need to use the standard OpenCL API to manage FPGA acceleration tasks. This improves portability since the same OpenCL program will work on any OpenCL-compliant computation device able to execute the same kernels, including CPUs, GPUs, and FPGAs. The proposed approach is based on pre-optimized FPGA bitstreams implementing well-defined OpenCL built-in kernels. This enables a clean separation of responsibilities between a hardware developer preparing the FPGA bitstreams containing the kernel implementations, a software developer launching computation tasks as OpenCL built-in kernels, and a bitstream distributor providing preoptimized FPGA IPs to end-users. The automated FPGA programming tool fetches bitstream files as needed from the distributor, reconfigures the FPGA, and manages the communication with the accelerator. We demonstrate that it is possible to achieve similar performance as the current FPGA vendor OpenCL implementations, while abstracting all FPGA-specific details from the software programmer. The cross-vendor potential of AFOCL is shown by porting the implementation to FPGAs from two different vendors (AMD and Altera), and to two different FPGA types [PCIe and system-on-chip (SoC)], and controlling all these systems with the same OpenCL host program.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2257-2268"},"PeriodicalIF":2.8,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10689610","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821279","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信