Kyosuke Kageyama, Sota Arai, Hajime Hamano, Xiangbo Kong, T. Kumaki, T. Koide
{"title":"基于cam的移动加速器大规模并行SIMD矩阵核对AES算法进行并行软件加密","authors":"Kyosuke Kageyama, Sota Arai, Hajime Hamano, Xiangbo Kong, T. Kumaki, T. Koide","doi":"10.12720/jait.14.2.355-362","DOIUrl":null,"url":null,"abstract":"—Recently, it has become possible to execute various digital multimedia applications, such as image compression, video compression, and audio processing, on mobile devices — as long as the processing core in the mobile device has the required high levels of performance, versatility, and programmability. Generally speaking, multimedia applications operate by performing repeated arithmetic and table-lookup coding operations. Therefore, to make it easier to achieve those required high levels of performance, versatility, and programmability, we propose an accelerator for mobile Central Processing Units (CPUs) known as a Content Addressable Memory-based massive-parallel Single Instruction Multiple Data (SIMD) Matrix Core (CAMX) that improves the processing speeds of both arithmetic and table-lookup coding operations. Our proposed CAMX, which is equipped with two CAM modules, has highly parallel processing capabilities that facilitate fast table-lookup coding operations. In fact, the results of Advanced Encryption Standard (AES) encryption simulations conducted in this study show that its AES encryption total clock cycles are 1,362,699. Additionally, a detailed breakdown of the number of clock cycles shows 1,312,160 for SubBytes, a combined total of 17,161 for ShiftRows and MixColumns, and 2519 for AddRoundKey. This paper also confirmed that CAMX could process AES encryptions at a rate of 83.17 clock cycles/byte. Also, the performance of CAMX, related works, and existing mobile processors are compared. The related works do not have a dedicated circuit for AES processing. From the comparison results, CAMX provides a performance improvement of approximately 4.4-and 3569.1-times over the related works. The existing mobile processors are Texas Instruments (TI) DM3730 and a TI OMAP3530. From the comparison results, CAMX provides a performance improvement of approximately 2.1 times over TI DM3730 and TI OMAP3530.","PeriodicalId":0,"journal":{"name":"","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Parallel Software Encryption of AES Algorithm by Using CAM-Based Massive-Parallel SIMD Matrix Core for Mobile Accelerator\",\"authors\":\"Kyosuke Kageyama, Sota Arai, Hajime Hamano, Xiangbo Kong, T. Kumaki, T. Koide\",\"doi\":\"10.12720/jait.14.2.355-362\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"—Recently, it has become possible to execute various digital multimedia applications, such as image compression, video compression, and audio processing, on mobile devices — as long as the processing core in the mobile device has the required high levels of performance, versatility, and programmability. Generally speaking, multimedia applications operate by performing repeated arithmetic and table-lookup coding operations. Therefore, to make it easier to achieve those required high levels of performance, versatility, and programmability, we propose an accelerator for mobile Central Processing Units (CPUs) known as a Content Addressable Memory-based massive-parallel Single Instruction Multiple Data (SIMD) Matrix Core (CAMX) that improves the processing speeds of both arithmetic and table-lookup coding operations. Our proposed CAMX, which is equipped with two CAM modules, has highly parallel processing capabilities that facilitate fast table-lookup coding operations. In fact, the results of Advanced Encryption Standard (AES) encryption simulations conducted in this study show that its AES encryption total clock cycles are 1,362,699. Additionally, a detailed breakdown of the number of clock cycles shows 1,312,160 for SubBytes, a combined total of 17,161 for ShiftRows and MixColumns, and 2519 for AddRoundKey. This paper also confirmed that CAMX could process AES encryptions at a rate of 83.17 clock cycles/byte. Also, the performance of CAMX, related works, and existing mobile processors are compared. The related works do not have a dedicated circuit for AES processing. From the comparison results, CAMX provides a performance improvement of approximately 4.4-and 3569.1-times over the related works. The existing mobile processors are Texas Instruments (TI) DM3730 and a TI OMAP3530. From the comparison results, CAMX provides a performance improvement of approximately 2.1 times over TI DM3730 and TI OMAP3530.\",\"PeriodicalId\":0,\"journal\":{\"name\":\"\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.12720/jait.14.2.355-362\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.12720/jait.14.2.355-362","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Parallel Software Encryption of AES Algorithm by Using CAM-Based Massive-Parallel SIMD Matrix Core for Mobile Accelerator
—Recently, it has become possible to execute various digital multimedia applications, such as image compression, video compression, and audio processing, on mobile devices — as long as the processing core in the mobile device has the required high levels of performance, versatility, and programmability. Generally speaking, multimedia applications operate by performing repeated arithmetic and table-lookup coding operations. Therefore, to make it easier to achieve those required high levels of performance, versatility, and programmability, we propose an accelerator for mobile Central Processing Units (CPUs) known as a Content Addressable Memory-based massive-parallel Single Instruction Multiple Data (SIMD) Matrix Core (CAMX) that improves the processing speeds of both arithmetic and table-lookup coding operations. Our proposed CAMX, which is equipped with two CAM modules, has highly parallel processing capabilities that facilitate fast table-lookup coding operations. In fact, the results of Advanced Encryption Standard (AES) encryption simulations conducted in this study show that its AES encryption total clock cycles are 1,362,699. Additionally, a detailed breakdown of the number of clock cycles shows 1,312,160 for SubBytes, a combined total of 17,161 for ShiftRows and MixColumns, and 2519 for AddRoundKey. This paper also confirmed that CAMX could process AES encryptions at a rate of 83.17 clock cycles/byte. Also, the performance of CAMX, related works, and existing mobile processors are compared. The related works do not have a dedicated circuit for AES processing. From the comparison results, CAMX provides a performance improvement of approximately 4.4-and 3569.1-times over the related works. The existing mobile processors are Texas Instruments (TI) DM3730 and a TI OMAP3530. From the comparison results, CAMX provides a performance improvement of approximately 2.1 times over TI DM3730 and TI OMAP3530.