IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
An Efficient and Precision-Reconfigurable Digital CIM Macro for DNN Accelerators 用于DNN加速器的高效高精度可重构数字CIM宏
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-24 DOI: 10.1109/TVLSI.2024.3455091
Dingyang Zou;Gaoche Zhang;Xu Zhang;Meiqi Wang;Zhongfeng Wang
{"title":"An Efficient and Precision-Reconfigurable Digital CIM Macro for DNN Accelerators","authors":"Dingyang Zou;Gaoche Zhang;Xu Zhang;Meiqi Wang;Zhongfeng Wang","doi":"10.1109/TVLSI.2024.3455091","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3455091","url":null,"abstract":"Due to the demand for high energy efficiency in deep neural network (DNN) accelerators, computing-in-memory (CIM) is becoming increasingly popular in recent years. However, current CIM designs suffer from high latency and insufficient flexibility. To address the issues, this brief proposes a Booth-multiplication-based CIM macro (BCIM) with modified Booth encoding and partial product (PP) generation method specially designed for CIM architecture. In addition, a methodology is presented for designing precision-reconfigurable digital CIM macros. We also optimize the precision-reconfigurable shift adder in the macro based on the cutting down carry connection method. The design attains a performance of 2048 GOPS and a peak energy efficiency of 79.15 TOPS/W in the signed INT4 mode at a frequency of 500 MHz.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"563-567"},"PeriodicalIF":2.8,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142993413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SCOPE: Schoolbook-Originated Novel Polynomial Multiplication Accelerators for NTRU-Based PQC 经营范围:基于教科书的基于ntrupqc的新型多项式乘法加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-24 DOI: 10.1109/TVLSI.2024.3458872
Yazheng Tu;Shi Bai;Jinjun Xiong;Jiafeng Xie
{"title":"SCOPE: Schoolbook-Originated Novel Polynomial Multiplication Accelerators for NTRU-Based PQC","authors":"Yazheng Tu;Shi Bai;Jinjun Xiong;Jiafeng Xie","doi":"10.1109/TVLSI.2024.3458872","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3458872","url":null,"abstract":"The <italic>N</i>th-degree truncated polynomial ring units (NTRUs)-based postquantum cryptography (PQC) has drawn significant attention from the research communities, e.g., the National Institute of Standards and Technology (NIST) PQC standardization process selected algorithm Fast Fourier lattice-based compact (Falcon). Following the research trend, efficient hardware accelerator design for polynomial multiplication (an important component of the NTRU-based PQC) is crucial. Unlike the commonly used number theoretic transform (NTT) method, in this article, we have presented a novel SChoolbook-Originated Polynomial multiplication accElerators (SCOPE) design framework. Overall, we have proposed the schoolbook-based method in an innovative format to implement the targeted polynomial multiplication, first through a schoolbook-variant version and then through a Toeplitz matrix-vector product (TMVP)-based approach. Four layers of coherent and interdependent efforts have been carried out: 1) a novel lookup table (LUT)-based point-wise multiplier is proposed along with a related modular reduction technique to obtain optimal implementation; 2) a new hardware accelerator is introduced for the targeted polynomial multiplication, deploying the proposed point-wise multiplier; 3) the proposed architecture is extended to a TMVP-based polynomial multiplication accelerator; and 4) the efficiency of the proposed accelerators is demonstrated through implementation and comparison. Finally, the proposed design strategy is also extended to another NTRU-based scheme and other schoolbook- and toom-cook-based polynomial multiplications (used in other PQC), and obtains the same superior performance. We hope that the outcome of this research can impact the ongoing NIST PQC standardization process and related full-hardware implementation work for schemes like Falcon.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"408-420"},"PeriodicalIF":2.8,"publicationDate":"2024-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992936","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Keelhaul: Processor-Driven Chip Connectivity and Memory Map Metadata Validator for Large Systems-on-Chip Keelhaul:面向大型片上系统的处理器驱动芯片连接和内存映射元数据验证器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-23 DOI: 10.1109/TVLSI.2024.3454431
Henri Lunnikivi;Roni Hämäläinen;Timo D. Hämäläinen
{"title":"Keelhaul: Processor-Driven Chip Connectivity and Memory Map Metadata Validator for Large Systems-on-Chip","authors":"Henri Lunnikivi;Roni Hämäläinen;Timo D. Hämäläinen","doi":"10.1109/TVLSI.2024.3454431","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3454431","url":null,"abstract":"The integration of large-scale systems-on-chip warrants thorough verification both at the level of the individual component and at the system level. In this article, we address the automated testing of system-level memory maps. The golden reference is the IEEE 1685/IP-XACT hardware description, which includes implementation agnostic definitions for the global memory map. The IP-XACT description is used as a specification for implementing the registers and memory regions in a register transfer-level (RTL) language, and for implementing the corresponding hardware-dependent software. The challenge is that hardware design changes might not always propagate to firmware and applications developers, which causes errors and faults. We present a method and a tool called Keelhaul which takes as input the CMSIS-SVD format commonly used for firmware development and generates automated software tests that attempt to access all available memory mapped input/output registers. During development of a large-scale research-focused multiprocessor system-on-chip, we ran a total of 32 automatically generated test suites per pipeline comprising 882 test cases for each of its two CPU subsystems. A total of 15 distinct issues were found by the tool in the lead-up to tapeout. Another research-focused SoC was validated posttapeout with 984 test cases generated for each core, resulting in the discovery of four distinct issues. Keelhaul can be used with any IP-XACT or CMSIS-SVD-based systems-on-chip that include processors for accessing implemented registers and memory regions.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2269-2280"},"PeriodicalIF":2.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821278","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 0.4 V, 12.2 pW Leakage, 36.5 fJ/Step Switching Efficiency Data Retention Flip-Flop in 22 nm FDSOI 一个0.4 V, 12.2 pW漏电,36.5 fJ/阶跃开关效率的22nm FDSOI数据保持触发器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-20 DOI: 10.1109/TVLSI.2024.3453946
Yuxin Ji;Yuhang Zhang;Changyan Chen;Jian Zhao;Fakhrul Zaman Rokhani;Yehea Ismail;Yongfu Li
{"title":"A 0.4 V, 12.2 pW Leakage, 36.5 fJ/Step Switching Efficiency Data Retention Flip-Flop in 22 nm FDSOI","authors":"Yuxin Ji;Yuhang Zhang;Changyan Chen;Jian Zhao;Fakhrul Zaman Rokhani;Yehea Ismail;Yongfu Li","doi":"10.1109/TVLSI.2024.3453946","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3453946","url":null,"abstract":"Data-retention flip-flops (DR-FFs) efficiently maintain data during sleep mode, and retain state during transitions between active and sleep mode. This brief proposes an ultralow power DR-FF design with an improved autonomous data-retention (ADR) latch operating with a supply voltage range down to near/subthreshold, achieving a sleep mode leakage power of 12.2 pW, <inline-formula> <tex-math>$1.4times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$3.8times $ </tex-math></inline-formula> less than the prior CMOS DR-FFs. Our proposed DR-FFs consume the lowest active mode switching efficiency of 36.5 fJ/step, <inline-formula> <tex-math>$1.2times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$4times $ </tex-math></inline-formula> less than the prior works, and a comparable transition efficiency of 1.9 fJ/step. Furthermore, our proposed DR-FFs require minimal control signals, logic gates, and switches, significantly reducing design complexity, and avoiding the drawbacks of nonvolatile data retention FFs (NV-FFs).","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"573-577"},"PeriodicalIF":2.8,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142992940","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Quasi-Adiabatic Clock Networks in 3-D Voltage Stacked Systems 三维电压堆叠系统中的准绝热时钟网络
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-19 DOI: 10.1109/TVLSI.2024.3448374
Andres Ayes;Eby G. Friedman
{"title":"Quasi-Adiabatic Clock Networks in 3-D Voltage Stacked Systems","authors":"Andres Ayes;Eby G. Friedman","doi":"10.1109/TVLSI.2024.3448374","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3448374","url":null,"abstract":"Power delivery in three-dimensional (3-D) integrated systems poses several challenges such as high current densities, large voltage drops due to multiple levels of resistive vertical interconnect, and significant switching noise originating from transient currents within different layers. Voltage stacking is a power delivery technique that is highly compatible with 3-D integration due to the physical proximity between layers, enabling the efficient transfer of recycled current. Power noise in clock networks is, however, not inherently addressed by 3-D voltage stacking. In this brief, a quasi-adiabatic technique between multiple clock networks within 3-D voltage stacked systems is proposed. The technique exploits the proximity of the clock networks to enable mutual charging and discharging when the clock signals transition to the same voltage. During this transition, the clock distribution networks are isolated from the power grid, reducing simultaneous switching noise and current load. The maximum current is reduced by an additional 13% as compared to only voltage stacking, the maximum voltage noise is reduced by up to 72% when the clock networks are isolated from the power grids, and the clock networks pull nearly 50% less charge from the source. The proposed technique is evaluated on a 7 nm predictive technology model.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2394-2397"},"PeriodicalIF":2.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142821254","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Error Analysis of Bit Weight Self-Calibration Methods for High-Resolution SAR ADCs 高分辨率 SAR ADC 比特权重自校准方法的误差分析
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-19 DOI: 10.1109/TVLSI.2024.3458071
Yanhang Chen;Siji Huang;Qifeng Huang;Yifei Fan;Jie Yuan
{"title":"The Error Analysis of Bit Weight Self-Calibration Methods for High-Resolution SAR ADCs","authors":"Yanhang Chen;Siji Huang;Qifeng Huang;Yifei Fan;Jie Yuan","doi":"10.1109/TVLSI.2024.3458071","DOIUrl":"https://doi.org/10.1109/TVLSI.2024.3458071","url":null,"abstract":"High-resolution successive approximation register (SAR) analog-to-digital converters (ADCs) commonly need to calibrate their bit weights. Due to the nonidealities of the calibration circuits, the calibrated bit weights carry errors. This error could propagate during the calibration procedure. Due to the high precision requirement of these ADCs, such residue error commonly becomes the signal-to-noise-and-distortion ratio (SNDR) bottleneck of the overall ADC. This article presents an analysis of the residue error from bit weight self-calibration methods of high-resolution SAR ADCs. The major sources contributing to this error and the error reduction methods are quantitively analyzed. A statistical analysis of the noise-induced random error is developed. Our statistical model finds that the noise-induced random error follows the chi-square distribution. In practice, this random error is commonly reduced by repetitively measuring and averaging the calibrated bit weights. Our statistical model quantifies this bit weight error and leads to a clearer understanding of the error mechanism and design trade-offs. Following our chi-square model, the SNDR degradation due to the circuit noise during the calibration can be easily estimated without going through the time-consuming traditional transistor-level design and simulation process. The required repetition time can also be calculated. The bit-weight error models derived in this article are verified with measurement on a 16-bit SAR ADC design in a 180-nm CMOS process. Results from our model match both simulations and measurements well.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"1983-1992"},"PeriodicalIF":2.8,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142518122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MCAIMem: A Mixed SRAM and eDRAM Cell for Area and Energy-Efficient On-Chip AI Memory MCAIMem:用于面积和能效高的片上人工智能存储器的混合 SRAM 和 eDRAM 单元
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-18 DOI: 10.1109/TVLSI.2024.3439231
Duy-Thanh Nguyen;Abhiroop Bhattacharjee;Abhishek Moitra;Priyadarshini Panda
{"title":"MCAIMem: A Mixed SRAM and eDRAM Cell for Area and Energy-Efficient On-Chip AI Memory","authors":"Duy-Thanh Nguyen;Abhiroop Bhattacharjee;Abhishek Moitra;Priyadarshini Panda","doi":"10.1109/TVLSI.2024.3439231","DOIUrl":"10.1109/TVLSI.2024.3439231","url":null,"abstract":"AI chips commonly employ SRAM memory as buffers for their reliability and speed, which contribute to high performance. However, SRAM is expensive and demands significant area and energy consumption. Previous studies have explored replacing SRAM with emerging technologies, such as nonvolatile memory, which offers fast read memory access and a small cell area. Despite these advantages, nonvolatile memory’s slow write memory access and high write energy consumption prevent it from surpassing SRAM performance in AI applications with extensive memory access requirements. Some research has also investigated embedded dynamic random access memory (eDRAM) as an area-efficient on-chip memory with similar access times as SRAM. Still, refresh power remains a concern, leaving the trade-off among performance, area, and power consumption unresolved. To address this issue, this article presents a novel mixed CMOS cell memory design that balances performance, area, and energy efficiency for AI memory by combining SRAM and eDRAM cells. We consider the proportion ratio of one SRAM and seven eDRAM cells in the memory to achieve area reduction using mixed CMOS cell memory. In addition, we capitalize on the characteristics of deep neural network (DNN) data representation and integrate asymmetric eDRAM cells to lower energy consumption. To validate our proposed MCAIMem solution, we conduct extensive simulations and benchmarking against traditional SRAM. Our results demonstrate that the MCAIMem significantly outperforms these alternatives in terms of area and energy efficiency. Specifically, our MCAIMem can reduce the area by 48% and energy consumption by \u0000<inline-formula> <tex-math>$3.4times $ </tex-math></inline-formula>\u0000 compared with SRAM designs, without incurring any accuracy loss.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 11","pages":"2023-2036"},"PeriodicalIF":2.8,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Marmotini: A Weight Density Adaptation Architecture With Hybrid Compression Method for Spiking Neural Network Marmotini:采用混合压缩方法的尖峰神经网络权重密度自适应架构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-18 DOI: 10.1109/TVLSI.2024.3453897
Zilin Wang;Yi Zhong;Zehong Ou;Youming Yang;Shuo Feng;Guang Chen;Xiaoxin Cui;Song Jia;Yuan Wang
{"title":"Marmotini: A Weight Density Adaptation Architecture With Hybrid Compression Method for Spiking Neural Network","authors":"Zilin Wang;Yi Zhong;Zehong Ou;Youming Yang;Shuo Feng;Guang Chen;Xiaoxin Cui;Song Jia;Yuan Wang","doi":"10.1109/TVLSI.2024.3453897","DOIUrl":"10.1109/TVLSI.2024.3453897","url":null,"abstract":"Brain-inspired spiking neural network (SNN) has recently attracted widespread interest owing to its event-driven nature and relatively low-power hardware for transmitting highly sparse binary spikes. To further improve energy efficiency, some matrix compression algorithms are used for weight storage. However, the weight sparsity of different layers varies greatly. For a multicore neuromorphic system, it is difficult for the same compression algorithm to adapt to all the layers of SNN model. In this work, we propose a weight density adaptation architecture with hybrid compression method for SNN, named Marmotini. It is a multicore heterogeneous design, including three types of cores to complete computation of different weight sparsity. Benefiting from the hybrid compression method, Marmotini minimizes the waste of neurons and weights as much as possible. Besides, for better flexibility, a reconfigurable core that can be configured to compute convolutional layer or fully connected layer is proposed. Implemented on Xilinx Kintex UltraScale XCKU115 field-programmable gate array (FPGA) board, Marmotini can operate at 150-MHz frequency, achieving 244.6-GSOP/s peak performance and 54.1-GSOP/W energy efficiency at 0% spike sparsity.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2293-2302"},"PeriodicalIF":2.8,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266250","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A 22-nm 264-GOPS/mm2 6T SRAM and Proportional Current Compute Cell-Based Computing-in-Memory Macro for CNNs 用于 CNN 的 22 纳米 264-GOPS/mm$^{2}$ 6T SRAM 和基于比例电流计算单元的内存计算宏
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-18 DOI: 10.1109/TVLSI.2024.3446045
Feiran Liu;Anran Yin;Chen Xue;Bo Wang;Zhongyuan Feng;Han Liu;Xiang Li;Hui Gao;Tianzhu Xiong;Xin Si
{"title":"A 22-nm 264-GOPS/mm2 6T SRAM and Proportional Current Compute Cell-Based Computing-in-Memory Macro for CNNs","authors":"Feiran Liu;Anran Yin;Chen Xue;Bo Wang;Zhongyuan Feng;Han Liu;Xiang Li;Hui Gao;Tianzhu Xiong;Xin Si","doi":"10.1109/TVLSI.2024.3446045","DOIUrl":"10.1109/TVLSI.2024.3446045","url":null,"abstract":"With the rise of artificial intelligence and big data applications, the general-purpose Von Neumann architecture is no longer capable of fulfilling the requirements of these application scenarios. The large amount of parallelizable and repeatable multiply-and-accumulate (MAC) operations in deep neural networks provide the possibility for the emergence of storage-computing integrated architectures. Current-based computation and quantization are employed to circumvent signal margin limitations on the power supply voltage of the computing unit, thereby facilitating low-power design. The proposed design is a computing-in-memory (CIM) circuit based on current sampling accumulation and applies a current-sensing analog-to-digital converter design that exhibits reduced sensitivity to parasitic capacitance compared to voltage-based analog-to-digital converters. Its power consumption is proportional to the input current, achieving higher area efficiency and energy efficiency gains. The design of the CIM circuit based on the current sampling in the 22-nm FDSOI process is fabricated with an area efficiency of 264 GOPS/mm2. The peak energy efficiency is 20.81 TOPS/W, and the inference accuracy reaches 92.11% when employed to VGG-16 under CIFAR-10 dataset.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"32 12","pages":"2389-2393"},"PeriodicalIF":2.8,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266251","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Interpolation-Free Fractional Motion Estimation Algorithm and Hardware Implementation for VVC 用于 VVC 的无插值分数运动估计算法和硬件实现
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-17 DOI: 10.1109/TVLSI.2024.3455374
Shushi Chen;Leilei Huang;Zhao Zan;Xiaoyang Zeng;Yibo Fan
{"title":"An Interpolation-Free Fractional Motion Estimation Algorithm and Hardware Implementation for VVC","authors":"Shushi Chen;Leilei Huang;Zhao Zan;Xiaoyang Zeng;Yibo Fan","doi":"10.1109/TVLSI.2024.3455374","DOIUrl":"10.1109/TVLSI.2024.3455374","url":null,"abstract":"Versatile video coding (VVC) introduces multi-type tree (MTT) and larger coding tree unit (CTU) to improve compression efficiency compared to its predecessor High Efficiency Video Coding (HEVC). This leads to higher throughput for fractional motion estimation (FME) to meet the needs of real-time processing. In this context, this article proposes an interpolation-free algorithm based on an error surface to improve the throughput of FME hardware. The error surface is constructed by the rate-distortion costs (RDCs) of the integer motion vector (IMV) and its neighbors. To improve the prediction accuracy, a hardware-friendly RDC estimation strategy is proposed to construct the error surface. The experimental results show that the corresponding Bjontegaard Delta Bit Rate (BDBR) in Random Access (RA), Low Delay P (LDP) and Low Delay B (LDB) configuration increases by only 0.358%, 0.479%, and 0.511% compared with the VVC test model (VTM) 16.0. Compared with the default FME algorithms of VVC, the time cost of FME is reduced by 53.47%, 56.28%, and 54.23%, respectively, in RA, LDP, and LDB configurations. The algorithm is free of iteration and interpolation, which can contribute to low-cost and high-throughput hardware. The proposed architecture can support FME of all coding units (CUs) in a CTU with one layer of MTT under the quaternary tree (QT), and the CU size can vary from <inline-formula> <tex-math>$8times 8$ </tex-math></inline-formula> to <inline-formula> <tex-math>$128times 128$ </tex-math></inline-formula>. Synthesized using GF 28-nm process, the architecture can achieve <inline-formula> <tex-math>$7680times 4320$ </tex-math></inline-formula>@60 fps throughput at 800 MHz, with a gate count of 244 K and power consumption of 76.5 mW. This proposed architecture can meet the real-time coding requirements of VVC.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 2","pages":"395-407"},"PeriodicalIF":2.8,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142266249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信