{"title":"Hardware architecture design and mapping of ‘Fast Inverse Square Root’ algorithm","authors":"Saad Zafar, Raviteja Adapa","doi":"10.1109/ICAEE.2014.6838433","DOIUrl":null,"url":null,"abstract":"The Fast Inverse Square Root algorithm has been used in 3D games of past for lighting and reflection calculations, because it offers up to four times performance gains. This paper presents a hardware implementation of the algorithm on an FPGA board by designing the complete architecture and successfully mapping it on Xilinx Spartan 3E after thorough functional verification. The results show that this implementation provides a very efficient single-precision floating point inverse square root calculator with practically accurate results being made available after just 12 short clock cycles. This performance measure is far superior to the software counterpart of the algorithm, and is not processor dependent like rsqrtss of x86 SSE instruction set. Results of this work can aid FPGA based vector processors or graphic processing units with 3D rendering. The hardware design can also form part of a larger floating point arithmetic unit for dedicated reciprocal square root calculations.","PeriodicalId":151739,"journal":{"name":"2014 International Conference on Advances in Electrical Engineering (ICAEE)","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Advances in Electrical Engineering (ICAEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAEE.2014.6838433","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
The Fast Inverse Square Root algorithm has been used in 3D games of past for lighting and reflection calculations, because it offers up to four times performance gains. This paper presents a hardware implementation of the algorithm on an FPGA board by designing the complete architecture and successfully mapping it on Xilinx Spartan 3E after thorough functional verification. The results show that this implementation provides a very efficient single-precision floating point inverse square root calculator with practically accurate results being made available after just 12 short clock cycles. This performance measure is far superior to the software counterpart of the algorithm, and is not processor dependent like rsqrtss of x86 SSE instruction set. Results of this work can aid FPGA based vector processors or graphic processing units with 3D rendering. The hardware design can also form part of a larger floating point arithmetic unit for dedicated reciprocal square root calculations.