Kyosuke Kageyama, Hajime Hamano, Ryogo Kayama, Tetsushi Koide, T. Kumaki
{"title":"基于cam的大规模并行SIMD矩阵核模乘法的实现","authors":"Kyosuke Kageyama, Hajime Hamano, Ryogo Kayama, Tetsushi Koide, T. Kumaki","doi":"10.1109/ITC-CSCC58803.2023.10212453","DOIUrl":null,"url":null,"abstract":"Recently, semiconductor technology has been growing rapidly, and mobile devices are required to have digital convergence. In this paper, a CAM-based massive-parallel SIMD matrix core (CAMX) is proposed as a mobile device accelerator for high-performance, programmability, and versatility. The CAMX can process repeated arithmetic and table-lookup coding operations in parallel. Mobile devices need to keep secure communication on the Internet, and the cipher techniques are wide spread for secure communication like the Advanced Encryption Standard (AES), Rivest-Shamir-Adleman (RSA), and so on. In particular, RSA encryption needs to calculate modulo multiplication. In this paper, the CAMX executes modulo multiplication for secure communication. This processing involves multiplication and division processing. This processing is implemented in the CAMX, and the CAMX can execute modulo multiplication at 1,104 clock cycles in parallel. In addition, the clock cycles of the CAMX and an ARM core are compared. From the result, the CAMX can reduce clock cycles by approximately 86 % for 1,024 parallel processing data relative to the ARM core with NEON.","PeriodicalId":220939,"journal":{"name":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","volume":"292 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Implementation of Modulo Multiplication with CAM-Based Massive-Parallel SIMD Matrix Core\",\"authors\":\"Kyosuke Kageyama, Hajime Hamano, Ryogo Kayama, Tetsushi Koide, T. Kumaki\",\"doi\":\"10.1109/ITC-CSCC58803.2023.10212453\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recently, semiconductor technology has been growing rapidly, and mobile devices are required to have digital convergence. In this paper, a CAM-based massive-parallel SIMD matrix core (CAMX) is proposed as a mobile device accelerator for high-performance, programmability, and versatility. The CAMX can process repeated arithmetic and table-lookup coding operations in parallel. Mobile devices need to keep secure communication on the Internet, and the cipher techniques are wide spread for secure communication like the Advanced Encryption Standard (AES), Rivest-Shamir-Adleman (RSA), and so on. In particular, RSA encryption needs to calculate modulo multiplication. In this paper, the CAMX executes modulo multiplication for secure communication. This processing involves multiplication and division processing. This processing is implemented in the CAMX, and the CAMX can execute modulo multiplication at 1,104 clock cycles in parallel. In addition, the clock cycles of the CAMX and an ARM core are compared. From the result, the CAMX can reduce clock cycles by approximately 86 % for 1,024 parallel processing data relative to the ARM core with NEON.\",\"PeriodicalId\":220939,\"journal\":{\"name\":\"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)\",\"volume\":\"292 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ITC-CSCC58803.2023.10212453\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITC-CSCC58803.2023.10212453","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Implementation of Modulo Multiplication with CAM-Based Massive-Parallel SIMD Matrix Core
Recently, semiconductor technology has been growing rapidly, and mobile devices are required to have digital convergence. In this paper, a CAM-based massive-parallel SIMD matrix core (CAMX) is proposed as a mobile device accelerator for high-performance, programmability, and versatility. The CAMX can process repeated arithmetic and table-lookup coding operations in parallel. Mobile devices need to keep secure communication on the Internet, and the cipher techniques are wide spread for secure communication like the Advanced Encryption Standard (AES), Rivest-Shamir-Adleman (RSA), and so on. In particular, RSA encryption needs to calculate modulo multiplication. In this paper, the CAMX executes modulo multiplication for secure communication. This processing involves multiplication and division processing. This processing is implemented in the CAMX, and the CAMX can execute modulo multiplication at 1,104 clock cycles in parallel. In addition, the clock cycles of the CAMX and an ARM core are compared. From the result, the CAMX can reduce clock cycles by approximately 86 % for 1,024 parallel processing data relative to the ARM core with NEON.