{"title":"改进基于FPGA的SHA-3结构","authors":"Magnus Sundal, R. Chaves","doi":"10.1109/HST.2017.7951823","DOIUrl":null,"url":null,"abstract":"This work is focused on FPGA based implementations of the SHA-3 hash functions. The existing literature classifies the existing implementations according to the adopted structural optimization techniques, namely: folding, pipelining and unrolling. Several structures have been proposed in the state-of-the-art, which vary mainly in the level of folding and the number of pipeline stages. While unfolded structures allow obtaining higher throughputs, folded structures require less area resources at a cost of lower throughputs. It should be noted that due to the dependencies within the round caused by the step-mappings, the complexity increases as the folding technique is adopted. As suggested by the literature, the best results are achieved when using a slice-wise approach, rather than a lane-wise folding. With this approach, the resulting structure is able to process 16 slices on each iteration. However, special care must be taken regarding data dependencies in the θ and ρ step-mappings, in order to provide the necessary input values for the computation of the slices on each iteration. The ρ step-mapping dependencies were solved by re-scheduling the round computation as Rresc = θ ο ι ο χ ο π ο ρ. With this, it is possible to split the round computation into two parts, one computing θ and the other computing π,χ, and ι, with the ρ step-mapping embedded into the state memory. This approach, considering a tradeoff between performance and throughout, allows to mitigate the data dependency, thus allowing to improve the Throughput per Area efficiency regarding the existing state-of-the-art by up to 50%.","PeriodicalId":190635,"journal":{"name":"2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Improving FPGA based SHA-3 structures\",\"authors\":\"Magnus Sundal, R. Chaves\",\"doi\":\"10.1109/HST.2017.7951823\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work is focused on FPGA based implementations of the SHA-3 hash functions. The existing literature classifies the existing implementations according to the adopted structural optimization techniques, namely: folding, pipelining and unrolling. Several structures have been proposed in the state-of-the-art, which vary mainly in the level of folding and the number of pipeline stages. While unfolded structures allow obtaining higher throughputs, folded structures require less area resources at a cost of lower throughputs. It should be noted that due to the dependencies within the round caused by the step-mappings, the complexity increases as the folding technique is adopted. As suggested by the literature, the best results are achieved when using a slice-wise approach, rather than a lane-wise folding. With this approach, the resulting structure is able to process 16 slices on each iteration. However, special care must be taken regarding data dependencies in the θ and ρ step-mappings, in order to provide the necessary input values for the computation of the slices on each iteration. The ρ step-mapping dependencies were solved by re-scheduling the round computation as Rresc = θ ο ι ο χ ο π ο ρ. With this, it is possible to split the round computation into two parts, one computing θ and the other computing π,χ, and ι, with the ρ step-mapping embedded into the state memory. This approach, considering a tradeoff between performance and throughout, allows to mitigate the data dependency, thus allowing to improve the Throughput per Area efficiency regarding the existing state-of-the-art by up to 50%.\",\"PeriodicalId\":190635,\"journal\":{\"name\":\"2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)\",\"volume\":\"9 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HST.2017.7951823\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Hardware Oriented Security and Trust (HOST)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HST.2017.7951823","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
这项工作的重点是基于FPGA的SHA-3哈希函数的实现。现有文献根据采用的结构优化技术对现有实现进行了分类,即折叠、流水线和展开。在最先进的技术中已经提出了几种结构,它们主要在折叠水平和管道阶段的数量上有所不同。而展开结构可以获得更高的吞吐量,折叠结构需要更少的面积资源,以较低的吞吐量为代价。需要注意的是,由于步进映射导致的轮内依赖关系,采用折叠技术时复杂度会增加。正如文献所建议的那样,当使用切片方式而不是车道方式折叠时,可以获得最佳结果。使用这种方法,生成的结构能够在每次迭代中处理16个片。但是,必须特别注意θ和ρ阶跃映射中的数据依赖关系,以便在每次迭代中为切片的计算提供必要的输入值。ρ阶跃映射依赖关系通过将轮算重新调度为resc = θ ο ι ο χ ο π ο ρ来求解。这样,就有可能将轮计算分成两部分,一部分计算θ,另一部分计算π,χ和ι, ρ阶跃映射嵌入到状态存储器中。这种方法考虑了性能和吞吐量之间的权衡,可以减轻数据依赖性,从而可以将现有技术的每区域吞吐量效率提高50%。
This work is focused on FPGA based implementations of the SHA-3 hash functions. The existing literature classifies the existing implementations according to the adopted structural optimization techniques, namely: folding, pipelining and unrolling. Several structures have been proposed in the state-of-the-art, which vary mainly in the level of folding and the number of pipeline stages. While unfolded structures allow obtaining higher throughputs, folded structures require less area resources at a cost of lower throughputs. It should be noted that due to the dependencies within the round caused by the step-mappings, the complexity increases as the folding technique is adopted. As suggested by the literature, the best results are achieved when using a slice-wise approach, rather than a lane-wise folding. With this approach, the resulting structure is able to process 16 slices on each iteration. However, special care must be taken regarding data dependencies in the θ and ρ step-mappings, in order to provide the necessary input values for the computation of the slices on each iteration. The ρ step-mapping dependencies were solved by re-scheduling the round computation as Rresc = θ ο ι ο χ ο π ο ρ. With this, it is possible to split the round computation into two parts, one computing θ and the other computing π,χ, and ι, with the ρ step-mapping embedded into the state memory. This approach, considering a tradeoff between performance and throughout, allows to mitigate the data dependency, thus allowing to improve the Throughput per Area efficiency regarding the existing state-of-the-art by up to 50%.