Arvon: A Heterogeneous SiP Integrating a 14nm FPGA and Two 22nm 1.8TFLOPS/W DSPs with 1.7Tbps/mm2 AIB 2.0 Interface to Provide Versatile Workload Acceleration
Wei Tang, Sung-gun Cho, T. Hoang, Jacob Botimer, Wei Qiang Zhu, Ching-Chi Chang, Cheng-Hsun Lu, Junkang Zhu, Yaoyu Tao, Tianyu Wei, Naomi Kavi Motwani, Mani Yalamanchi, Ramya Yarlagadda, S. Kale, Mark Flannigan, Allen Chan, Thungoc Tran, Sergey Y. Shumarayev, Zhengya Zhang
{"title":"Arvon: A Heterogeneous SiP Integrating a 14nm FPGA and Two 22nm 1.8TFLOPS/W DSPs with 1.7Tbps/mm2 AIB 2.0 Interface to Provide Versatile Workload Acceleration","authors":"Wei Tang, Sung-gun Cho, T. Hoang, Jacob Botimer, Wei Qiang Zhu, Ching-Chi Chang, Cheng-Hsun Lu, Junkang Zhu, Yaoyu Tao, Tianyu Wei, Naomi Kavi Motwani, Mani Yalamanchi, Ramya Yarlagadda, S. Kale, Mark Flannigan, Allen Chan, Thungoc Tran, Sergey Y. Shumarayev, Zhengya Zhang","doi":"10.23919/VLSITechnologyandCir57934.2023.10185388","DOIUrl":null,"url":null,"abstract":"Arvon is a heterogeneous system in a package (SiP) that integrates a 14nm FPGA chiplet with two dense and efficient 22nm DSP chiplets through Embedded Multi-die Interconnect Bridges (EMIBs) as illustrated in Fig. 1. The chiplets communicate via a 1.536Tbps Advanced Interface Bus (AIB) 1.0 interface and a 7.68Tbps AIB 2.0 interface. We demonstrate the first-ever AIB 2.0 I/O prototype using $36 \\mu \\mathrm{m}$-pitch microbumps, achieving 4Gbps/pin at 0.10pJ/b (0.46pJ/b including adapter), and a bandwidth density of 1.024Tbps/mm-shoreline and 1.705Tbps/mm2-area. Arvon is programmable, supporting workloads from neural network (NN) to communication processing (comm) and providing a peak performance of 4.14TFLOPS (FP16, half-precision floating-point) by each DSP chiplet at 1.8TFLOPS/W. A compilation flow is developed to map workloads across FPGA and DSPs to optimize performance and utilization.","PeriodicalId":317958,"journal":{"name":"2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/VLSITechnologyandCir57934.2023.10185388","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Arvon is a heterogeneous system in a package (SiP) that integrates a 14nm FPGA chiplet with two dense and efficient 22nm DSP chiplets through Embedded Multi-die Interconnect Bridges (EMIBs) as illustrated in Fig. 1. The chiplets communicate via a 1.536Tbps Advanced Interface Bus (AIB) 1.0 interface and a 7.68Tbps AIB 2.0 interface. We demonstrate the first-ever AIB 2.0 I/O prototype using $36 \mu \mathrm{m}$-pitch microbumps, achieving 4Gbps/pin at 0.10pJ/b (0.46pJ/b including adapter), and a bandwidth density of 1.024Tbps/mm-shoreline and 1.705Tbps/mm2-area. Arvon is programmable, supporting workloads from neural network (NN) to communication processing (comm) and providing a peak performance of 4.14TFLOPS (FP16, half-precision floating-point) by each DSP chiplet at 1.8TFLOPS/W. A compilation flow is developed to map workloads across FPGA and DSPs to optimize performance and utilization.