GCM在gpu上的并行实现

IF 4.1 3区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
JaeSeok Lee , DongCheon Kim , Seog Chung Seo
{"title":"GCM在gpu上的并行实现","authors":"JaeSeok Lee ,&nbsp;DongCheon Kim ,&nbsp;Seog Chung Seo","doi":"10.1016/j.icte.2025.01.006","DOIUrl":null,"url":null,"abstract":"<div><div>This paper presents the first fully parallelized optimization of GCM in a GPU environment. As the era of IoT emerges, a large number of clients communicate with servers, necessitating encrypted communications for security. GCM is a type of AEAD and is currently used in various security protocols, including TLS 1.3 and IPsec. Due to the burden of performing encrypted communication with numerous clients, there has been significant research on utilizing GPUs for high-speed parallel processing in encryption. However, to date, there has been no fully parallelized implementation of GCM on GPUs. This paper proposes a method for parallelizing the challenging GHASH computation in GCM mode, leading to a high-speed parallel implementation of AES-GCM that can exceed 400Gb/s, meeting the requirements of next-generation communication systems. The proposed approach is algorithm-independent and can be applied to any block ciphers. Our implementation on an RTX 4090 demonstrates a performance improvement of <span><math><mrow><mo>×</mo><mn>15</mn><mo>.</mo><mn>38</mn></mrow></math></span> compared to the maximum processing throughput of a multi-threaded Intel(R) Core(TM) i7-13700K. It also achieves a <span><math><mrow><mo>×</mo><mn>17</mn><mo>.</mo><mn>87</mn></mrow></math></span> improvement compared to a hybrid CPU–GPU system. Compared to the most researched FPGA implementation for GCM, specifically Xilinx Ultrascale FPGA, our implementation achieves <span><math><mrow><mo>×</mo><mn>1</mn><mo>.</mo><mn>11</mn></mrow></math></span> better performance. For not only throughput but also power efficiency also better than other implementation, it achieves <span><math><mrow><mo>×</mo><mn>3</mn><mo>.</mo><mn>33</mn></mrow></math></span> compared to CPU implementation on Intel Xeon E3-1220, also it achieves <span><math><mrow><mo>×</mo><mn>21</mn><mo>.</mo><mn>09</mn></mrow></math></span> compared to FPGA implementation for AES on Xilinx Virtex 7 series, which is not including full GCM.</div></div>","PeriodicalId":48526,"journal":{"name":"ICT Express","volume":"11 2","pages":"Pages 310-316"},"PeriodicalIF":4.1000,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel implementation of GCM on GPUs\",\"authors\":\"JaeSeok Lee ,&nbsp;DongCheon Kim ,&nbsp;Seog Chung Seo\",\"doi\":\"10.1016/j.icte.2025.01.006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This paper presents the first fully parallelized optimization of GCM in a GPU environment. As the era of IoT emerges, a large number of clients communicate with servers, necessitating encrypted communications for security. GCM is a type of AEAD and is currently used in various security protocols, including TLS 1.3 and IPsec. Due to the burden of performing encrypted communication with numerous clients, there has been significant research on utilizing GPUs for high-speed parallel processing in encryption. However, to date, there has been no fully parallelized implementation of GCM on GPUs. This paper proposes a method for parallelizing the challenging GHASH computation in GCM mode, leading to a high-speed parallel implementation of AES-GCM that can exceed 400Gb/s, meeting the requirements of next-generation communication systems. The proposed approach is algorithm-independent and can be applied to any block ciphers. Our implementation on an RTX 4090 demonstrates a performance improvement of <span><math><mrow><mo>×</mo><mn>15</mn><mo>.</mo><mn>38</mn></mrow></math></span> compared to the maximum processing throughput of a multi-threaded Intel(R) Core(TM) i7-13700K. It also achieves a <span><math><mrow><mo>×</mo><mn>17</mn><mo>.</mo><mn>87</mn></mrow></math></span> improvement compared to a hybrid CPU–GPU system. Compared to the most researched FPGA implementation for GCM, specifically Xilinx Ultrascale FPGA, our implementation achieves <span><math><mrow><mo>×</mo><mn>1</mn><mo>.</mo><mn>11</mn></mrow></math></span> better performance. For not only throughput but also power efficiency also better than other implementation, it achieves <span><math><mrow><mo>×</mo><mn>3</mn><mo>.</mo><mn>33</mn></mrow></math></span> compared to CPU implementation on Intel Xeon E3-1220, also it achieves <span><math><mrow><mo>×</mo><mn>21</mn><mo>.</mo><mn>09</mn></mrow></math></span> compared to FPGA implementation for AES on Xilinx Virtex 7 series, which is not including full GCM.</div></div>\",\"PeriodicalId\":48526,\"journal\":{\"name\":\"ICT Express\",\"volume\":\"11 2\",\"pages\":\"Pages 310-316\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-02-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICT Express\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2405959525000062\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICT Express","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2405959525000062","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

本文首次在GPU环境下实现了GCM的完全并行化优化。随着物联网时代的到来,大量的客户端与服务器通信,为了安全需要加密通信。GCM是AEAD的一种,目前用于各种安全协议中,包括TLS 1.3和IPsec。由于与众多客户端进行加密通信的负担,利用gpu进行加密中的高速并行处理已经得到了大量的研究。然而,到目前为止,还没有GCM在gpu上的完全并行化实现。本文提出了一种在GCM模式下并行处理具有挑战性的GHASH计算的方法,使AES-GCM的高速并行实现速度超过400Gb/s,满足下一代通信系统的要求。该方法与算法无关,可应用于任何分组密码。与多线程Intel(R) Core(TM) i7-13700K的最大处理吞吐量相比,我们在RTX 4090上的实现显示了×15.38的性能改进。与混合CPU-GPU系统相比,它还实现了×17.87改进。与研究最多的GCM FPGA实现,特别是Xilinx Ultrascale FPGA相比,我们的实现实现了×1.11更好的性能。在吞吐量和功耗方面也优于其他实现,与Intel Xeon E3-1220上的CPU实现相比,它达到×3.33,与Xilinx Virtex 7系列上的AES FPGA实现相比,它达到×21.09,后者不包括完整的GCM。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Parallel implementation of GCM on GPUs
This paper presents the first fully parallelized optimization of GCM in a GPU environment. As the era of IoT emerges, a large number of clients communicate with servers, necessitating encrypted communications for security. GCM is a type of AEAD and is currently used in various security protocols, including TLS 1.3 and IPsec. Due to the burden of performing encrypted communication with numerous clients, there has been significant research on utilizing GPUs for high-speed parallel processing in encryption. However, to date, there has been no fully parallelized implementation of GCM on GPUs. This paper proposes a method for parallelizing the challenging GHASH computation in GCM mode, leading to a high-speed parallel implementation of AES-GCM that can exceed 400Gb/s, meeting the requirements of next-generation communication systems. The proposed approach is algorithm-independent and can be applied to any block ciphers. Our implementation on an RTX 4090 demonstrates a performance improvement of ×15.38 compared to the maximum processing throughput of a multi-threaded Intel(R) Core(TM) i7-13700K. It also achieves a ×17.87 improvement compared to a hybrid CPU–GPU system. Compared to the most researched FPGA implementation for GCM, specifically Xilinx Ultrascale FPGA, our implementation achieves ×1.11 better performance. For not only throughput but also power efficiency also better than other implementation, it achieves ×3.33 compared to CPU implementation on Intel Xeon E3-1220, also it achieves ×21.09 compared to FPGA implementation for AES on Xilinx Virtex 7 series, which is not including full GCM.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
ICT Express
ICT Express Multiple-
CiteScore
10.20
自引率
1.90%
发文量
167
审稿时长
35 weeks
期刊介绍: The ICT Express journal published by the Korean Institute of Communications and Information Sciences (KICS) is an international, peer-reviewed research publication covering all aspects of information and communication technology. The journal aims to publish research that helps advance the theoretical and practical understanding of ICT convergence, platform technologies, communication networks, and device technologies. The technology advancement in information and communication technology (ICT) sector enables portable devices to be always connected while supporting high data rate, resulting in the recent popularity of smartphones that have a considerable impact in economic and social development.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信