一种用于MVM计算的混合域和流水线模拟计算链

IF 2.8 2区工程技术 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2024-09-26 DOI:10.1109/TVLSI.2024.3439355

Tianzhu Xiong;Yuyang Ye;Xin Si;Jun Yang

{"title":"一种用于MVM计算的混合域和流水线模拟计算链","authors":"Tianzhu Xiong;Yuyang Ye;Xin Si;Jun Yang","doi":"10.1109/TVLSI.2024.3439355","DOIUrl":null,"url":null,"abstract":"In this article, a stream-architecture and pipelined hybrid computing chain is presented to process matrix-vector multiplication (MVM). In each stage of the computing chain, a primary multiply-accumulate (MAC) stage consisting of charge, time, and digital domain processing units makes signed or unsigned \n<inline-formula> <tex-math>$8\\times 1\\times 8$ </tex-math></inline-formula>\n bit MAC operations and MSB quantization. Based on the stream architecture, the length of the computing chain can be configured to fit different MVM applications. In the charge-domain MAC unit, a double-plate sampling and weighted capacitor array with writing yield and efficiency enhanced 7T bitcell and three-step weighting scheme is implemented. To utilize the speed and resolution advantages of time-domain computing, a high linearity voltage-to-time converter (VTC) followed by a dynamic tristate delay chain is proposed to transfer and store MAC values from the charge domain in the time domain. To realize fast analog readout, a folding type and distributed time-to-digital converter (TDC) is proposed. To fully eliminate the offset and variation in the distributed TDC, a specific residue readout timing and back-end calibration scheme are applied. In the digital domain, a double-input and double-clock dynamic D flip-flop is built to realize partial sum transmission and accumulation in a single cycle with low energy and area consumption. Post-simulation results show that this computing chain can achieve 20.89–40.72-TOPS/W energy efficiency and 4.498-TOPS/mm2 throughput.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"52-65"},"PeriodicalIF":2.8000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Hybrid Domain and Pipelined Analog Computing Chain for MVM Computation\",\"authors\":\"Tianzhu Xiong;Yuyang Ye;Xin Si;Jun Yang\",\"doi\":\"10.1109/TVLSI.2024.3439355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, a stream-architecture and pipelined hybrid computing chain is presented to process matrix-vector multiplication (MVM). In each stage of the computing chain, a primary multiply-accumulate (MAC) stage consisting of charge, time, and digital domain processing units makes signed or unsigned \\n<inline-formula> <tex-math>$8\\\\times 1\\\\times 8$ </tex-math></inline-formula>\\n bit MAC operations and MSB quantization. Based on the stream architecture, the length of the computing chain can be configured to fit different MVM applications. In the charge-domain MAC unit, a double-plate sampling and weighted capacitor array with writing yield and efficiency enhanced 7T bitcell and three-step weighting scheme is implemented. To utilize the speed and resolution advantages of time-domain computing, a high linearity voltage-to-time converter (VTC) followed by a dynamic tristate delay chain is proposed to transfer and store MAC values from the charge domain in the time domain. To realize fast analog readout, a folding type and distributed time-to-digital converter (TDC) is proposed. To fully eliminate the offset and variation in the distributed TDC, a specific residue readout timing and back-end calibration scheme are applied. In the digital domain, a double-input and double-clock dynamic D flip-flop is built to realize partial sum transmission and accumulation in a single cycle with low energy and area consumption. Post-simulation results show that this computing chain can achieve 20.89–40.72-TOPS/W energy efficiency and 4.498-TOPS/mm2 throughput.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 1\",\"pages\":\"52-65\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10695032/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10695032/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

本文提出了一种流结构和流水线混合计算链来处理矩阵向量乘法。在计算链的每个阶段，由电荷、时间和数字域处理单元组成的主乘法累积（MAC）阶段进行有符号或无符号的$8\ × 1\ × 8$位MAC操作和MSB量化。基于流架构，计算链的长度可以配置以适应不同的MVM应用。在电荷域MAC单元中，采用双极板采样和加权电容阵列，提高了7T位单元的写入良率和效率，并采用三步加权方案。为了利用时域计算的速度和分辨率优势，提出了一种高线性电压-时间转换器（VTC），其后是一个动态三态延迟链，将电荷域的MAC值传输和存储到时域。为了实现快速的模拟量读出，提出了一种折叠式分布式时间-数字转换器（TDC）。为了充分消除分布式上止点的偏移和变化，采用了特定的剩余读出时间和后端校准方案。在数字域，构建双输入双时钟动态D触发器，实现单周期部分和传输和累加，能耗和面积消耗低。后仿真结果表明，该计算链可以达到20.89 ~ 40.72- tops /W的能效和4.498-TOPS/mm2的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Hybrid Domain and Pipelined Analog Computing Chain for MVM Computation

In this article, a stream-architecture and pipelined hybrid computing chain is presented to process matrix-vector multiplication (MVM). In each stage of the computing chain, a primary multiply-accumulate (MAC) stage consisting of charge, time, and digital domain processing units makes signed or unsigned

$8\times 1\times 8$

bit MAC operations and MSB quantization. Based on the stream architecture, the length of the computing chain can be configured to fit different MVM applications. In the charge-domain MAC unit, a double-plate sampling and weighted capacitor array with writing yield and efficiency enhanced 7T bitcell and three-step weighting scheme is implemented. To utilize the speed and resolution advantages of time-domain computing, a high linearity voltage-to-time converter (VTC) followed by a dynamic tristate delay chain is proposed to transfer and store MAC values from the charge domain in the time domain. To realize fast analog readout, a folding type and distributed time-to-digital converter (TDC) is proposed. To fully eliminate the offset and variation in the distributed TDC, a specific residue readout timing and back-end calibration scheme are applied. In the digital domain, a double-input and double-clock dynamic D flip-flop is built to realize partial sum transmission and accumulation in a single cycle with low energy and area consumption. Post-simulation results show that this computing chain can achieve 20.89–40.72-TOPS/W energy efficiency and 4.498-TOPS/mm2 throughput.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Very Large Scale Integration (VLSI) Systems 工程技术-工程：电子与电气

CiteScore

6.40

自引率

7.10%

发文量

187

审稿时长

3.6 months

期刊介绍： The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society. Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels. To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.