IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-11 DOI: 10.1109/TVLSI.2024.3494295
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2024.3494295","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3494295","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10791330","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821185","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Scalable and Efficient Architecture for Binary Polynomial Multiplication in BIKE Utilizing Inter-/Inner-Wise Sparsity and Block-by-Block Pipeline 利用块间/块内稀疏性和逐块流水线在 BIKE 中实现二进制多项式乘法的可扩展高效架构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-11 DOI: 10.1109/TVLSI.2024.3510541
Jia Hou;Jianfei Wang;Yishuo Meng;Fahong Zhang;Yang Su;Chen Yang
{"title":"A Scalable and Efficient Architecture for Binary Polynomial Multiplication in BIKE Utilizing Inter-/Inner-Wise Sparsity and Block-by-Block Pipeline","authors":"Jia Hou;Jianfei Wang;Yishuo Meng;Fahong Zhang;Yang Su;Chen Yang","doi":"10.1109/TVLSI.2024.3510541","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3510541","url":null,"abstract":"Efficient binary polynomial multiplication (BPM) implementations are crucial for the practical deployment of bit flipping key encapsulation (BIKE) postquantum cryptography (PQC) due to its computation-intensive nature. To speed up BPM, this brief proposes a scalable and efficient architecture. The proposed architecture employs a novel blockwise sparsity algorithm, which segments sparse polynomials into blocks and leverages interblock and inner block sparsity to eliminate invalid computations, thereby significantly reducing computational operations. Moreover, a scalable block-by-block pipeline structure, along with a multibank random access memory (RAM) for sparse polynomials, is designed to effectively process blocks, resulting in substantial enhancement in performance. Experimental results on Xilinx Artix-7 Field-Programmable Gate Arrays (FPGAs) demonstrate significant performance superiority on the proposed architecture, compared with existing approaches. Across different bandwidth settings of 16, 32, 64, or 128, our design can achieve <inline-formula> <tex-math>$4.5times sim 35.1times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$4.9times sim 78.8times $ </tex-math></inline-formula>, <inline-formula> <tex-math>$2.5times sim 112.7times $ </tex-math></inline-formula>, and <inline-formula> <tex-math>$0.5times sim 164.2times $ </tex-math></inline-formula> speedup, respectively. Compared with state-of-the-art works, our design achieves <inline-formula> <tex-math>$2.8times sim 152.0times $ </tex-math></inline-formula> improvements in area efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1457-1461"},"PeriodicalIF":2.8,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Foundation - Reflecting on 50 Years of Impact IEEE基金会-反思50年的影响
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-11 DOI: 10.1109/TVLSI.2024.3504313
{"title":"IEEE Foundation - Reflecting on 50 Years of Impact","authors":"","doi":"10.1109/TVLSI.2024.3504313","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3504313","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2408-2408"},"PeriodicalIF":2.8,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10791339","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142810597","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Fast Design Optimization of On-Chip Equalizing Links Using Particle Swarm Optimization 基于粒子群算法的片上均衡链路快速设计优化
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-05 DOI: 10.1109/TVLSI.2024.3508079
Hyoseok Song;Kwangmin Kim;Gain Kim;Byungsub Kim
{"title":"A Fast Design Optimization of On-Chip Equalizing Links Using Particle Swarm Optimization","authors":"Hyoseok Song;Kwangmin Kim;Gain Kim;Byungsub Kim","doi":"10.1109/TVLSI.2024.3508079","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3508079","url":null,"abstract":"We propose a fast algorithm to optimize on-chip equalizing link design utilizing a particle swarm optimization (PSO) method. Finding the optimal design parameters of an equalizing link requires too much computation time, because the dependency between design parameters and performances is too complex, while design space is too large. The proposed algorithm greatly reduces the optimization time by utilizing the superior efficiency of PSO in heuristic search. In experiment, on average, the proposed algorithm optimized a link design \u0000<inline-formula> <tex-math>$168times $ </tex-math></inline-formula>\u0000 faster than the previous state-of-the-art result, requiring only 1/256 evaluation counts, and reduced computation time from about 2 h to 45 s.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"1-9"},"PeriodicalIF":2.8,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918377","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Charge Domain SRAM Computing-in-Memory Macro With Quantized Interval-Optimized ADC and Input Bit-Level Sparsity-Optimized P2O-DAC for 8-b MAC Operation 具有量化间隔优化ADC和输入位级稀疏优化p20 - dac的8-b MAC操作的电荷域SRAM内存计算宏
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-05 DOI: 10.1109/TVLSI.2024.3509432
Shukao Dou;Zupei Gu;Heng You;Yi Zhan;Shushan Qiao;Yumei Zhou
{"title":"A Charge Domain SRAM Computing-in-Memory Macro With Quantized Interval-Optimized ADC and Input Bit-Level Sparsity-Optimized P2O-DAC for 8-b MAC Operation","authors":"Shukao Dou;Zupei Gu;Heng You;Yi Zhan;Shushan Qiao;Yumei Zhou","doi":"10.1109/TVLSI.2024.3509432","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3509432","url":null,"abstract":"Computing-in-memory (CIM) has recently gained significant attention as it achieves high energy efficiency and throughput for deep convolutional neural networks (DCNNs). In this brief, we present a static random access memory (SRAM) CIM macro aimed at improving the energy efficiency of edge devices when performing 8-b multiply-and-accumulate (MAC) operations. The proposed architecture implements the following: 1) a successive approximation register analog-to-digital converter (SAR ADC) readout circuit based on a weight-flip-store (WFS) coding scheme, where energy efficiency is improved by optimizing the quantized interval; 2) an input-relevant partial power-off digital-to-analog converter (P2O-DAC) using input bit-level sparsity to reduce power consumption; and 3) a pipeline structure for interleaving MAC computation and readout operation to minimize the redundancy when loading input data into the CIM array. Our proposed CIM macro is implemented in TSMC 40-nm CMOS technology. Postlayout simulation results show an average macro energy efficiency of 16.8 TOPS/W without input and weight value sparsity.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1467-1471"},"PeriodicalIF":2.8,"publicationDate":"2024-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875244","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Efficient Test Architecture Using Hybrid Built-In Self-Test for Processing-in-Memory 基于混合内建自检的高效内存处理测试体系结构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-04 DOI: 10.1109/TVLSI.2024.3504539
Hayoung Lee;Juyong Lee;Sungho Kang
{"title":"An Efficient Test Architecture Using Hybrid Built-In Self-Test for Processing-in-Memory","authors":"Hayoung Lee;Juyong Lee;Sungho Kang","doi":"10.1109/TVLSI.2024.3504539","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3504539","url":null,"abstract":"With the rapid advances in artificial intelligence (AI), the demand for data-intensive analytics has surged. Consequently, extensive research on AI acceleration has been conducted to enhance AI performance. Processing-in-memory (PiM) has emerged as a promising AI acceleration architecture, offering an unprecedented high-bandwidth connection between compute and memory. However, integrating many components in PiM can lead to yield degradation. To address this issue, we propose an efficient test architecture that utilizes a hybrid built-in self-test (BIST) for PiM. This architecture utilizes the structural and operational characteristics of PiM to facilitate testing. It can execute testing through the existing functional paths without requiring any additional hardware implementation in PiM. Furthermore, it achieves a 100% test coverage with the small number of test patterns. In addition, the functionality of self-test can be realized for PiM through reconfiguration of the existing hardware, resulting in a very small area overhead.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"1452-1456"},"PeriodicalIF":2.8,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875220","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Manipulated Lookup Table Method for Efficient High-Performance Modular Multiplier 高效高性能模块乘法器的操纵查找表方法
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-04 DOI: 10.1109/TVLSI.2024.3505920
Anawin Opasatian;Makoto Ikeda
{"title":"Manipulated Lookup Table Method for Efficient High-Performance Modular Multiplier","authors":"Anawin Opasatian;Makoto Ikeda","doi":"10.1109/TVLSI.2024.3505920","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3505920","url":null,"abstract":"Modular multiplication is a fundamental operation in many cryptographic systems, with its efficiency playing a crucial role in the overall performance of these systems. Since many cryptographic systems operate with a fixed modulus, we propose an enhancement to the fixed modulus lookup table (LuT) method used for modular reduction, which we refer to as the manipulated LuT (MLuT) method. Our approach applies to any modulus and has demonstrated comparable performance compared with some specialized reduction algorithms designed for specific moduli. The strength of our proposed method in terms of circuit performance is shown by implementing it on Virtex7 and Virtex Ultrascale+ FPGA as the LUT-based MLuT modular multiplier (LUT-MLuTMM) with generalized parallel counters (GPCs) used in the summation step. In one-stage implementations, our proposed method achieves up to a 90% reduction in area and a 50% reduction in latency compared with the generic LuT method. In multistage implementations, our approach offers the best area-interleaved time product, with improvements of 39%, 13%, and 29% over the current state-of-the-art for ~256-bit, SIKE434, and BLS12-381 modular multipliers, respectively. These results demonstrate the potential of our method for high-performance cryptographic accelerators employing a fixed modulus.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"114-127"},"PeriodicalIF":2.8,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10777922","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 0.875–0.95-pJ/b 40-Gb/s PAM-3 Baud-Rate Receiver With One-Tap DFE 一种0.875 - 0.95 pj /b 40gb /s PAM-3波特率单接DFE接收机
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-04 DOI: 10.1109/TVLSI.2024.3507714
Jhe-En Lin;Shen-Iuan Liu
{"title":"A 0.875–0.95-pJ/b 40-Gb/s PAM-3 Baud-Rate Receiver With One-Tap DFE","authors":"Jhe-En Lin;Shen-Iuan Liu","doi":"10.1109/TVLSI.2024.3507714","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3507714","url":null,"abstract":"This article presents a 40-Gb/s (25.6-GBaud) three-level pulse amplitude modulation (PAM-3) baud-rate receiver with one-tap decision-feedback equalize (DFE). A baud-rate phase detector (BRPD) that locks at the point with zero first postcursor is proposed. In addition, by reusing the BRPD’s error samplers, a weighting coefficient calibration is presented to select the DFE weighting coefficient that maximizes the top level of the eye diagram, thereby improving eye height across different channel losses. An inductorless continuous-time linear equalizer (CTLE) and a variable gain amplifier (VGA) are also included. The VGA adjusts the output common-mode resistance to control data swing, reducing power consumption when the required swing is small. Furthermore, by using the modified summer-merged slicers, the capacitance from the slicers to the VGA is reduced. Finally, a digital clock/data recovery (CDR) circuit is presented, which includes a demultiplexer (DeMUX) with a short delay time to reduce the loop latency. The 40-Gb/s PAM-3 receiver is fabricated in 28-nm CMOS technology. For a 25.6-Gbaud pseudorandom ternary sequence of \u0000<inline-formula> <tex-math>$3^{7}$ </tex-math></inline-formula>\u0000–1, the measured bit error rate (BER) is below \u0000<inline-formula> <tex-math>$10^{-12}$ </tex-math></inline-formula>\u0000 for channel losses of 9 and 17.5 dB. At a 9-dB loss, total power consumption is 35-mW with a calculated FoM of 0.875-pJ/bit. At 17.5-dB loss, total power consumption is 38-mW with a calculated FoM of 0.95-pJ/bit.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"168-178"},"PeriodicalIF":2.8,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918175","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Embedded Architecture for DDR5 DFE Calibration Based on Channel Stimulus Inversion 一种基于通道刺激反演的DDR5 DFE标定嵌入式架构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-04 DOI: 10.1109/TVLSI.2024.3505835
Mitchell Cooke;Nicola Nicolici
{"title":"An Embedded Architecture for DDR5 DFE Calibration Based on Channel Stimulus Inversion","authors":"Mitchell Cooke;Nicola Nicolici","doi":"10.1109/TVLSI.2024.3505835","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3505835","url":null,"abstract":"The increase in performance promised by the recent generation of double data rate (DDR) memory, DDR5, is conditioned by addressing its signal integrity challenges. The DDR5 standard specifies a 4-tap decision feedback equalizer (DFE) at the memory receiver to deal with these challenges. Although adaptive equalization is a mature field, known methods for DFE calibration are limited by the DDR5 interface complexity and the equalization requirements mandated by its specification. In this article, we propose a novel approach based on linear inversion of channel stimulus that leverages specific architectural details of DDR5 and can tune memory devices deterministically at runtime. In addition to using few hardware resources relative to a modern memory controller, by operating at very low latency, this new approach facilitates periodic equalization when the DFE is offline, thus avoiding DFE error propagation during training inherent to adaptive techniques.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 3","pages":"793-806"},"PeriodicalIF":2.8,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143489181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
VSAGE: An End-to-End Automated VCO-Based ΔΣ ADC Generator VSAGE:端到端自动化基于vco的ΔΣ ADC生成器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-12-04 DOI: 10.1109/TVLSI.2024.3507567
Ken Li;Tian Xie;Tzu-Han Wang;Shaolan Li
{"title":"VSAGE: An End-to-End Automated VCO-Based ΔΣ ADC Generator","authors":"Ken Li;Tian Xie;Tzu-Han Wang;Shaolan Li","doi":"10.1109/TVLSI.2024.3507567","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3507567","url":null,"abstract":"This article presents VSAGE, an agile end-to-end automated voltage-controlled oscillator (VCO)-based \u0000<inline-formula> <tex-math>$Delta Sigma $ </tex-math></inline-formula>\u0000 analog-to-digital converter (ADC) generator. It exploits time-domain architectures and design mindset, so that the design flow is highly oriented around digital standard cells in contrast to the transistor-level-focused approach in conventional analog design. Through this, it speeds up and simplifies both the synthesis phase and layout phase. Combined with an efficient knowledge-machine learning (ML)-guided synthesis flow, it can translate input specifications to a full system layout with reliable performance within minutes. This work also features a compact oscillator and system modeling method that facilitates light-resource accurate computation and network training. The generator is verified with 12 design cases in 65-nm and 28-nm processes, proving its capability of generating competitive design with good process portability.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"128-139"},"PeriodicalIF":2.8,"publicationDate":"2024-12-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142918379","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信