{"title":"Compiling Efficiently with Arithmetic Emulation for the Custom-Width Connex Vector Processor","authors":"Alexandru E. Susu","doi":"10.1145/3303117.3306166","DOIUrl":null,"url":null,"abstract":"Compiling from sequential C programs using LLVM for the wide Connex vector accelerator, a competitive customizable architecture for embedded applications with 32 to 4096 16-bit integer lanes, is challenging.\n Our compiler targets Opincaa, a JIT assembler and coordination C++ library for Connex, which is able to run portable programs w.r.t. the vector width. For this to work, our back end needs to handle symbolic C/C++ expressions represented as adjacent inline assembly strings, which are used as scalar immediate operands in the vector code.\n Also, our back end for Connex needs to lower code to emulate efficiently arithmetic operations for non-native types such as 32-bit integer and 16-bit floating point. To simplify the work of the compiler writer we conceive a method to code generate how we lower these operations inside LLVM's instruction selection pass.\n We report speedup factors of up to 12.24 when running on a Connex processor with 128 lanes w.r.t. the dual-core ARM Cortex A9 clocked at a frequency 6.67 times higher, and an energy efficiency improvement average of 1.07 times. However, note that a Connex IC can achieve an order of magnitude more energy efficiency than our FPGA implementation.","PeriodicalId":381073,"journal":{"name":"WPMVP'19","volume":"46 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"WPMVP'19","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3303117.3306166","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Compiling from sequential C programs using LLVM for the wide Connex vector accelerator, a competitive customizable architecture for embedded applications with 32 to 4096 16-bit integer lanes, is challenging.
Our compiler targets Opincaa, a JIT assembler and coordination C++ library for Connex, which is able to run portable programs w.r.t. the vector width. For this to work, our back end needs to handle symbolic C/C++ expressions represented as adjacent inline assembly strings, which are used as scalar immediate operands in the vector code.
Also, our back end for Connex needs to lower code to emulate efficiently arithmetic operations for non-native types such as 32-bit integer and 16-bit floating point. To simplify the work of the compiler writer we conceive a method to code generate how we lower these operations inside LLVM's instruction selection pass.
We report speedup factors of up to 12.24 when running on a Connex processor with 128 lanes w.r.t. the dual-core ARM Cortex A9 clocked at a frequency 6.67 times higher, and an energy efficiency improvement average of 1.07 times. However, note that a Connex IC can achieve an order of magnitude more energy efficiency than our FPGA implementation.