{"title":"Hardware overhead analysis of programmability in ARX crypto processing","authors":"Mohamed El-Hadedy, K. Skadron","doi":"10.1145/2768566.2768574","DOIUrl":null,"url":null,"abstract":"This paper evaluates the area and performance overhead of a programmable cryptographic accelerator specialized to support ARX (Add, Rotate, and Xor) based encryption standards, which are common in symmetric cryptography. This overhead is measured by comparing to a variety of custom ARX implementations optimized specifically for π -- Cipher. This is a new algorithm for authenticated encryption that offers advantages over AES-GCM and is a candidate in the CAESAR competition. The programmable processor is designed to accommodate different word sizes, different block sizes and different security levels. The custom variants require separate versions to support these diverse capabilities. We find that the overhead of the programmability is quite high. For example, we implemented the Programmable Processing Element PPE in 227 slices, achieving a throughput of about 1.2 Gbps/block, regardless of the word size. In comparison, our best custom 64-bit implementation so far requires 445 slices, achieving 3.09 Gbps. This means that two PPEs running in parallel can achieve 75% of the throughput of the custom 64-bit solution, while providing flexibility to support diverse cryptographic standards.","PeriodicalId":332892,"journal":{"name":"Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2015-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Fourth Workshop on Hardware and Architectural Support for Security and Privacy","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2768566.2768574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper evaluates the area and performance overhead of a programmable cryptographic accelerator specialized to support ARX (Add, Rotate, and Xor) based encryption standards, which are common in symmetric cryptography. This overhead is measured by comparing to a variety of custom ARX implementations optimized specifically for π -- Cipher. This is a new algorithm for authenticated encryption that offers advantages over AES-GCM and is a candidate in the CAESAR competition. The programmable processor is designed to accommodate different word sizes, different block sizes and different security levels. The custom variants require separate versions to support these diverse capabilities. We find that the overhead of the programmability is quite high. For example, we implemented the Programmable Processing Element PPE in 227 slices, achieving a throughput of about 1.2 Gbps/block, regardless of the word size. In comparison, our best custom 64-bit implementation so far requires 445 slices, achieving 3.09 Gbps. This means that two PPEs running in parallel can achieve 75% of the throughput of the custom 64-bit solution, while providing flexibility to support diverse cryptographic standards.