{"title":"软件和硬件中轻量级密码的安全有效掩蔽","authors":"Xuefeng Zhao","doi":"10.1093/comjnl/bxad002","DOIUrl":null,"url":null,"abstract":"Abstract Masking is a well used and widely deployed countermeasure against side channel attacks, both in software and hardware. With masking comes at a great cost, search has focused on how to lower a performance penalty or find efficient masking implementation. In particular, our contribution is 2-fold: for software masking, we first find bitsliced implementations of Sbox with Multiplicative Complexity 4 and Multiplicative Depth 2, then adapt the common shares approach introduced by Coron et al. at CHES 2016 to make many cross-products $a_{i}\\cdot b_{j}$ can be reuse for parallel ISW-based 32-bit nonlinear operations. Therefore, we improve the efficiency of 2$\\times b/4/32$ parallel high-order masking of ISW scheme for RECTANGLE, TANGRAM and KNOT on 32-bit ARM embedded microprocessor, with roughly a 13%-34% speed-up, at cost of $(1+d) \\times 32$-bit randomness. For hardware masking, 4 bit cubic Sboxes with quadratic decomposition length 2, including RECTANGLE, TANGRAM, KNOT and LWC third-round candidates, can be implemented with a 3-share and 4-share threshold implementation (TI) by decomposing cubic permutations $S$ as a composition of sub-permutations having lower algebraic degrees. We use two decomposition form: one composition of two quadratic permutations $G$ and $F$, $S = F\\circ G$, is for efficiency; the other composition of some linear permutations $A_i$ and one quadratic permutation $G$, $S=A_3 \\circ G \\circ A_2 \\circ G \\circ A_1 $, is for reducing the area requirements. For $S = F\\circ G$, we introduce a new approach of searching through all possible quadratic permutations $G$ with 2$^{25.71}$, which is effcient than 2$^{26.23}$ in Poschmann et al. at J. Cryptol 2011. For $S=A_3 \\circ G \\circ A_2 \\circ G \\circ A_1 $, our approach of finding $A_i$ with complexity 2$^{27.71} $, which is effcient than the method introduced by Moradi et al. at ASIACRYPT 2016. In addition, we proposes a new decomposition that $S=G \\circ A_2 \\circ G \\circ A_1 $. We can find the fastest and the smallest hard-ware decomposition implementation of 4-bit permutations for TI with 3 and 4 shares.","PeriodicalId":50641,"journal":{"name":"Computer Journal","volume":"72 1","pages":"0"},"PeriodicalIF":1.5000,"publicationDate":"2023-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Secure and Efficient Masking of Lightweight Ciphers in Software and Hardware\",\"authors\":\"Xuefeng Zhao\",\"doi\":\"10.1093/comjnl/bxad002\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Masking is a well used and widely deployed countermeasure against side channel attacks, both in software and hardware. With masking comes at a great cost, search has focused on how to lower a performance penalty or find efficient masking implementation. In particular, our contribution is 2-fold: for software masking, we first find bitsliced implementations of Sbox with Multiplicative Complexity 4 and Multiplicative Depth 2, then adapt the common shares approach introduced by Coron et al. at CHES 2016 to make many cross-products $a_{i}\\\\cdot b_{j}$ can be reuse for parallel ISW-based 32-bit nonlinear operations. Therefore, we improve the efficiency of 2$\\\\times b/4/32$ parallel high-order masking of ISW scheme for RECTANGLE, TANGRAM and KNOT on 32-bit ARM embedded microprocessor, with roughly a 13%-34% speed-up, at cost of $(1+d) \\\\times 32$-bit randomness. For hardware masking, 4 bit cubic Sboxes with quadratic decomposition length 2, including RECTANGLE, TANGRAM, KNOT and LWC third-round candidates, can be implemented with a 3-share and 4-share threshold implementation (TI) by decomposing cubic permutations $S$ as a composition of sub-permutations having lower algebraic degrees. We use two decomposition form: one composition of two quadratic permutations $G$ and $F$, $S = F\\\\circ G$, is for efficiency; the other composition of some linear permutations $A_i$ and one quadratic permutation $G$, $S=A_3 \\\\circ G \\\\circ A_2 \\\\circ G \\\\circ A_1 $, is for reducing the area requirements. For $S = F\\\\circ G$, we introduce a new approach of searching through all possible quadratic permutations $G$ with 2$^{25.71}$, which is effcient than 2$^{26.23}$ in Poschmann et al. at J. Cryptol 2011. For $S=A_3 \\\\circ G \\\\circ A_2 \\\\circ G \\\\circ A_1 $, our approach of finding $A_i$ with complexity 2$^{27.71} $, which is effcient than the method introduced by Moradi et al. at ASIACRYPT 2016. In addition, we proposes a new decomposition that $S=G \\\\circ A_2 \\\\circ G \\\\circ A_1 $. We can find the fastest and the smallest hard-ware decomposition implementation of 4-bit permutations for TI with 3 and 4 shares.\",\"PeriodicalId\":50641,\"journal\":{\"name\":\"Computer Journal\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2023-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Journal\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/comjnl/bxad002\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Journal","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/comjnl/bxad002","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
摘要
掩蔽是一种应用广泛的对抗侧信道攻击的方法,无论是在软件还是硬件上都是如此。由于屏蔽的代价很大,搜索的重点是如何降低性能损失或找到有效的屏蔽实现。特别是,我们的贡献是双重的:对于软件屏蔽,我们首先找到了具有乘法复杂度4和乘法深度2的Sbox的位切片实现,然后采用Coron等人在CHES 2016上引入的公共共享方法,使许多交叉乘积$a_{i}\cdot b_{j}$可以被重用用于并行的基于isw的32位非线性操作。因此,我们在32位ARM嵌入式微处理器上,以$(1+d) $ × 32位随机性为代价,提高了2$\times b/4/32$并行ISW方案的高阶掩码效率,大约提高了13%-34%的速度。对于硬件掩蔽,通过将立方排列$S$分解为具有较低代数度的子排列的组合,可以用3共享和4共享阈值实现(TI)实现二次分解长度为2的4位立方Sboxes(包括RECTANGLE、TANGRAM、KNOT和LWC第三轮候选)。我们采用了两种分解形式:一种是由两个二次置换$G$和$F$组成,$S = F\circ G$,是为了效率;另一个线性排列$A_i$和一个二次排列$G$的组合$S=A_3 \circ G \circ A_2 \circ G \circ A_1 $是为了减少面积要求。对于$S = F\circ G$,我们引入了一种用2$^{25.71}$搜索所有可能的二次置换$G$的新方法,该方法比Poschmann et al. J. Cryptol 2011中的2$^{26.23}$高效。对于$S=A_3 \circ G \circ A_2 \circ G \circ A_1 $,我们找到复杂度为2$^{27.71}$的$A_i$的方法比Moradi等人在ASIACRYPT 2016上介绍的方法更有效。此外,我们提出了一个新的分解$S=G \circ A_2 \circ G \circ A_1 $。我们可以找到最快和最小的硬件分解实现的4位排列的TI与3和4份额。
Secure and Efficient Masking of Lightweight Ciphers in Software and Hardware
Abstract Masking is a well used and widely deployed countermeasure against side channel attacks, both in software and hardware. With masking comes at a great cost, search has focused on how to lower a performance penalty or find efficient masking implementation. In particular, our contribution is 2-fold: for software masking, we first find bitsliced implementations of Sbox with Multiplicative Complexity 4 and Multiplicative Depth 2, then adapt the common shares approach introduced by Coron et al. at CHES 2016 to make many cross-products $a_{i}\cdot b_{j}$ can be reuse for parallel ISW-based 32-bit nonlinear operations. Therefore, we improve the efficiency of 2$\times b/4/32$ parallel high-order masking of ISW scheme for RECTANGLE, TANGRAM and KNOT on 32-bit ARM embedded microprocessor, with roughly a 13%-34% speed-up, at cost of $(1+d) \times 32$-bit randomness. For hardware masking, 4 bit cubic Sboxes with quadratic decomposition length 2, including RECTANGLE, TANGRAM, KNOT and LWC third-round candidates, can be implemented with a 3-share and 4-share threshold implementation (TI) by decomposing cubic permutations $S$ as a composition of sub-permutations having lower algebraic degrees. We use two decomposition form: one composition of two quadratic permutations $G$ and $F$, $S = F\circ G$, is for efficiency; the other composition of some linear permutations $A_i$ and one quadratic permutation $G$, $S=A_3 \circ G \circ A_2 \circ G \circ A_1 $, is for reducing the area requirements. For $S = F\circ G$, we introduce a new approach of searching through all possible quadratic permutations $G$ with 2$^{25.71}$, which is effcient than 2$^{26.23}$ in Poschmann et al. at J. Cryptol 2011. For $S=A_3 \circ G \circ A_2 \circ G \circ A_1 $, our approach of finding $A_i$ with complexity 2$^{27.71} $, which is effcient than the method introduced by Moradi et al. at ASIACRYPT 2016. In addition, we proposes a new decomposition that $S=G \circ A_2 \circ G \circ A_1 $. We can find the fastest and the smallest hard-ware decomposition implementation of 4-bit permutations for TI with 3 and 4 shares.
期刊介绍:
The Computer Journal is one of the longest-established journals serving all branches of the academic computer science community. It is currently published in four sections.