{"title":"基于VectorBlox MXP矩阵处理器的fpga嵌入式超级计算","authors":"Aaron Severance, G. Lemieux","doi":"10.1109/CODES-ISSS.2013.6658993","DOIUrl":null,"url":null,"abstract":"Embedded systems frequently use FPGAs to perform highly parallel data processing tasks. However, building such a system usually requires specialized hardware design skills with VHDL or Verilog. Instead, this paper presents the VectorBlox MXP Matrix Processor, an FPGA-based soft processor capable of highly parallel execution. Programmed entirely in C, the MXP is capable of executing data-parallel software algorithms at hardware-like speeds. For example, the MXP running at 200MHz or higher can implement a multi-tap FIR filter and output 1 element per clock cycle. MXP's parameterized design lets the user specify the amount of parallelism required, ranging from 1 to 128 or more parallel ALUs. Key features of the MXP include a parallel-access scratchpad memory to hold vector data and high-throughput DMA and scatter/gather engines. To provide extreme performance, the processor is expandable with custom vector instructions and custom DMA filters. Finally, the MXP seamlessly ties into existing Altera and Xilinx development flows, simplifying system creation and deployment.","PeriodicalId":163484,"journal":{"name":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"64","resultStr":"{\"title\":\"Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor\",\"authors\":\"Aaron Severance, G. Lemieux\",\"doi\":\"10.1109/CODES-ISSS.2013.6658993\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Embedded systems frequently use FPGAs to perform highly parallel data processing tasks. However, building such a system usually requires specialized hardware design skills with VHDL or Verilog. Instead, this paper presents the VectorBlox MXP Matrix Processor, an FPGA-based soft processor capable of highly parallel execution. Programmed entirely in C, the MXP is capable of executing data-parallel software algorithms at hardware-like speeds. For example, the MXP running at 200MHz or higher can implement a multi-tap FIR filter and output 1 element per clock cycle. MXP's parameterized design lets the user specify the amount of parallelism required, ranging from 1 to 128 or more parallel ALUs. Key features of the MXP include a parallel-access scratchpad memory to hold vector data and high-throughput DMA and scatter/gather engines. To provide extreme performance, the processor is expandable with custom vector instructions and custom DMA filters. Finally, the MXP seamlessly ties into existing Altera and Xilinx development flows, simplifying system creation and deployment.\",\"PeriodicalId\":163484,\"journal\":{\"name\":\"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"volume\":\"52 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-09-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"64\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CODES-ISSS.2013.6658993\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CODES-ISSS.2013.6658993","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Embedded supercomputing in FPGAs with the VectorBlox MXP Matrix Processor
Embedded systems frequently use FPGAs to perform highly parallel data processing tasks. However, building such a system usually requires specialized hardware design skills with VHDL or Verilog. Instead, this paper presents the VectorBlox MXP Matrix Processor, an FPGA-based soft processor capable of highly parallel execution. Programmed entirely in C, the MXP is capable of executing data-parallel software algorithms at hardware-like speeds. For example, the MXP running at 200MHz or higher can implement a multi-tap FIR filter and output 1 element per clock cycle. MXP's parameterized design lets the user specify the amount of parallelism required, ranging from 1 to 128 or more parallel ALUs. Key features of the MXP include a parallel-access scratchpad memory to hold vector data and high-throughput DMA and scatter/gather engines. To provide extreme performance, the processor is expandable with custom vector instructions and custom DMA filters. Finally, the MXP seamlessly ties into existing Altera and Xilinx development flows, simplifying system creation and deployment.