{"title":"一种用于MVM计算的混合域和流水线模拟计算链","authors":"Tianzhu Xiong;Yuyang Ye;Xin Si;Jun Yang","doi":"10.1109/TVLSI.2024.3439355","DOIUrl":null,"url":null,"abstract":"In this article, a stream-architecture and pipelined hybrid computing chain is presented to process matrix-vector multiplication (MVM). In each stage of the computing chain, a primary multiply-accumulate (MAC) stage consisting of charge, time, and digital domain processing units makes signed or unsigned \n<inline-formula> <tex-math>$8\\times 1\\times 8$ </tex-math></inline-formula>\n bit MAC operations and MSB quantization. Based on the stream architecture, the length of the computing chain can be configured to fit different MVM applications. In the charge-domain MAC unit, a double-plate sampling and weighted capacitor array with writing yield and efficiency enhanced 7T bitcell and three-step weighting scheme is implemented. To utilize the speed and resolution advantages of time-domain computing, a high linearity voltage-to-time converter (VTC) followed by a dynamic tristate delay chain is proposed to transfer and store MAC values from the charge domain in the time domain. To realize fast analog readout, a folding type and distributed time-to-digital converter (TDC) is proposed. To fully eliminate the offset and variation in the distributed TDC, a specific residue readout timing and back-end calibration scheme are applied. In the digital domain, a double-input and double-clock dynamic D flip-flop is built to realize partial sum transmission and accumulation in a single cycle with low energy and area consumption. Post-simulation results show that this computing chain can achieve 20.89–40.72-TOPS/W energy efficiency and 4.498-TOPS/mm2 throughput.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 1","pages":"52-65"},"PeriodicalIF":2.8000,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Hybrid Domain and Pipelined Analog Computing Chain for MVM Computation\",\"authors\":\"Tianzhu Xiong;Yuyang Ye;Xin Si;Jun Yang\",\"doi\":\"10.1109/TVLSI.2024.3439355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this article, a stream-architecture and pipelined hybrid computing chain is presented to process matrix-vector multiplication (MVM). In each stage of the computing chain, a primary multiply-accumulate (MAC) stage consisting of charge, time, and digital domain processing units makes signed or unsigned \\n<inline-formula> <tex-math>$8\\\\times 1\\\\times 8$ </tex-math></inline-formula>\\n bit MAC operations and MSB quantization. Based on the stream architecture, the length of the computing chain can be configured to fit different MVM applications. In the charge-domain MAC unit, a double-plate sampling and weighted capacitor array with writing yield and efficiency enhanced 7T bitcell and three-step weighting scheme is implemented. To utilize the speed and resolution advantages of time-domain computing, a high linearity voltage-to-time converter (VTC) followed by a dynamic tristate delay chain is proposed to transfer and store MAC values from the charge domain in the time domain. To realize fast analog readout, a folding type and distributed time-to-digital converter (TDC) is proposed. To fully eliminate the offset and variation in the distributed TDC, a specific residue readout timing and back-end calibration scheme are applied. In the digital domain, a double-input and double-clock dynamic D flip-flop is built to realize partial sum transmission and accumulation in a single cycle with low energy and area consumption. Post-simulation results show that this computing chain can achieve 20.89–40.72-TOPS/W energy efficiency and 4.498-TOPS/mm2 throughput.\",\"PeriodicalId\":13425,\"journal\":{\"name\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"volume\":\"33 1\",\"pages\":\"52-65\"},\"PeriodicalIF\":2.8000,\"publicationDate\":\"2024-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Very Large Scale Integration (VLSI) Systems\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10695032/\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10695032/","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A Hybrid Domain and Pipelined Analog Computing Chain for MVM Computation
In this article, a stream-architecture and pipelined hybrid computing chain is presented to process matrix-vector multiplication (MVM). In each stage of the computing chain, a primary multiply-accumulate (MAC) stage consisting of charge, time, and digital domain processing units makes signed or unsigned
$8\times 1\times 8$
bit MAC operations and MSB quantization. Based on the stream architecture, the length of the computing chain can be configured to fit different MVM applications. In the charge-domain MAC unit, a double-plate sampling and weighted capacitor array with writing yield and efficiency enhanced 7T bitcell and three-step weighting scheme is implemented. To utilize the speed and resolution advantages of time-domain computing, a high linearity voltage-to-time converter (VTC) followed by a dynamic tristate delay chain is proposed to transfer and store MAC values from the charge domain in the time domain. To realize fast analog readout, a folding type and distributed time-to-digital converter (TDC) is proposed. To fully eliminate the offset and variation in the distributed TDC, a specific residue readout timing and back-end calibration scheme are applied. In the digital domain, a double-input and double-clock dynamic D flip-flop is built to realize partial sum transmission and accumulation in a single cycle with low energy and area consumption. Post-simulation results show that this computing chain can achieve 20.89–40.72-TOPS/W energy efficiency and 4.498-TOPS/mm2 throughput.
期刊介绍:
The IEEE Transactions on VLSI Systems is published as a monthly journal under the co-sponsorship of the IEEE Circuits and Systems Society, the IEEE Computer Society, and the IEEE Solid-State Circuits Society.
Design and realization of microelectronic systems using VLSI/ULSI technologies require close collaboration among scientists and engineers in the fields of systems architecture, logic and circuit design, chips and wafer fabrication, packaging, testing and systems applications. Generation of specifications, design and verification must be performed at all abstraction levels, including the system, register-transfer, logic, circuit, transistor and process levels.
To address this critical area through a common forum, the IEEE Transactions on VLSI Systems have been founded. The editorial board, consisting of international experts, invites original papers which emphasize and merit the novel systems integration aspects of microelectronic systems including interactions among systems design and partitioning, logic and memory design, digital and analog circuit design, layout synthesis, CAD tools, chips and wafer fabrication, testing and packaging, and systems level qualification. Thus, the coverage of these Transactions will focus on VLSI/ULSI microelectronic systems integration.