ACM Transactions on Embedded Computing Systems最新文献

筛选
英文 中文
Transient Fault Detection in Tensor Cores for Modern GPUs 现代 GPU 张量核中的瞬态故障检测
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-08-10 DOI: 10.1145/3687483
M. Hafezan, E. Atoofian
{"title":"Transient Fault Detection in Tensor Cores for Modern GPUs","authors":"M. Hafezan, E. Atoofian","doi":"10.1145/3687483","DOIUrl":"https://doi.org/10.1145/3687483","url":null,"abstract":"Deep Neural networks (DNNs) have emerged as an effective solution for many machine learning applications. However, the great success comes with the cost of excessive computation. The Volta graphics processing unit (GPU) from NVIDIA introduced a specialized hardware unit called tensor core (TC) aiming at meeting the growing computation demand needed by DNNs. Most previous studies on TCs have focused on performance improvement through the utilization of TC's high degree of parallelism. However, as DNNs are deployed into security-sensitive applications such as autonomous driving, the reliability of TCs is as important as performance.\u0000 In this work, we exploit the unique architectural characteristics of TCs and propose a simple and implementation-efficient hardware technique called fault detection in tensor core (FDTC) to detect transient faults in TCs. In particular, FDTC exploits the zero-valued weights that stem from network pruning as well as sparse activations arising from the common ReLU operator to verify tensor operations. High level of sparsity in tensors allows FDTC to run original and verifying products simultaneously, leading to zero performance penalty. For applications with low sparsity rate, FDTC relies on temporal redundancy to re-execute effectual products. FDTC schedules the execution of verifying products only when multipliers are idle. Our experimental results reveal that FDTC offers 100% fault coverage with no performance penalty and small energy overhead in TCs.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141920893","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Dilithium Implementation with AVX2/-512 利用 AVX2/-512 优化 Dilithium 实施
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-08-10 DOI: 10.1145/3687309
Runqing Xu, Debiao He, Min Luo, Cong Peng, Xiangyong Zeng
{"title":"Optimizing Dilithium Implementation with AVX2/-512","authors":"Runqing Xu, Debiao He, Min Luo, Cong Peng, Xiangyong Zeng","doi":"10.1145/3687309","DOIUrl":"https://doi.org/10.1145/3687309","url":null,"abstract":"Dilithium is a signature scheme that is currently being standardized to the Module-Lattice-Based Digital Signature Standard by NIST. It is believed to be secure even against attacks from large-scale quantum computers based on lattice problems. The implementation efficiency is important for promoting the migration of current cryptography algorithms to post-quantum cryptography algorithms. In this paper, we optimize the implementation of Dilithium with several new approaches proposed. Firstly, we improve the efficiency of parallel NTT implementations. The overhead of shuffling operations is reduced in our implementations, and fewer loading instructions are invoked for the precomputations. Then, we optimize the sampling and bit-packing of polynomial coefficients in Dilithium. We can handle double the number of coefficients within one register using a new approach for the sampling of secret key polynomials. The approaches proposed in this paper are applicable to implementations under AVX2 and AVX-512 instruction sets. Take Dilithium2 as an illustration, our AVX2 implementation demonstrates improvements of 22.7%, 16.9%, and 13.5% for KeyGen, Sign, and Verify compared to the previous implementation.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919332","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing Dilithium Implementation with AVX2/-512 利用 AVX2/-512 优化 Dilithium 实施
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-08-10 DOI: 10.1145/3687309
Runqing Xu, Debiao He, Min Luo, Cong Peng, Xiangyong Zeng
{"title":"Optimizing Dilithium Implementation with AVX2/-512","authors":"Runqing Xu, Debiao He, Min Luo, Cong Peng, Xiangyong Zeng","doi":"10.1145/3687309","DOIUrl":"https://doi.org/10.1145/3687309","url":null,"abstract":"Dilithium is a signature scheme that is currently being standardized to the Module-Lattice-Based Digital Signature Standard by NIST. It is believed to be secure even against attacks from large-scale quantum computers based on lattice problems. The implementation efficiency is important for promoting the migration of current cryptography algorithms to post-quantum cryptography algorithms. In this paper, we optimize the implementation of Dilithium with several new approaches proposed. Firstly, we improve the efficiency of parallel NTT implementations. The overhead of shuffling operations is reduced in our implementations, and fewer loading instructions are invoked for the precomputations. Then, we optimize the sampling and bit-packing of polynomial coefficients in Dilithium. We can handle double the number of coefficients within one register using a new approach for the sampling of secret key polynomials. The approaches proposed in this paper are applicable to implementations under AVX2 and AVX-512 instruction sets. Take Dilithium2 as an illustration, our AVX2 implementation demonstrates improvements of 22.7%, 16.9%, and 13.5% for KeyGen, Sign, and Verify compared to the previous implementation.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141919737","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems 用于安全关键型系统的高性能和可预测共享末级高速缓存
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-08-08 DOI: 10.1145/3687308
Zhuanhao Wu, A. Kaushik, Hiren D. Patel
{"title":"High Performance and Predictable Shared Last-level Cache for Safety-Critical Systems","authors":"Zhuanhao Wu, A. Kaushik, Hiren D. Patel","doi":"10.1145/3687308","DOIUrl":"https://doi.org/10.1145/3687308","url":null,"abstract":"We propose ZeroCost-LLC (ZCLLC), a novel shared inclusive last-level cache (LLC) design for timing predictable multi-core platforms that offers lower worst-case latency (WCL) when compared to a traditional shared inclusive LLC design. ZCLLC achieves low WCL by eliminating certain memory operations in the form of cache line invalidations across the cache hierarchy that are a consequence of a core’s memory request that misses in the cache hierarchy and when there is no vacant entry in the LLC to accommodate the fetched data for this request. In addition to low WCL, ZCLLC offers performance benefits in the form of additional caching capacity and unlike state-of-the-art approaches, ZCLLC does not impose any constraints on its usage across multiple cores. In this work, we describe the impact of LLC cache line invalidations on the WCL and systematically build solutions to eliminate these invalidations resulting in ZCLLC. We also present ZCLLC, an optimized variant of ZCLLC that offers lower WCL and improved average-case performance over ZCLLC. We apply optimizations to the shared bus arbitration mechanism and extend the micro-architecture of ZCLLC to allow for overlapping memory requests to the main memory. Our analysis reveals that the analytical WCL of a memory request under ZCLLC is 87.0%, 93.8%, and 97.1% lower than that under state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. ZCLLC shows average-case performance speedups of 1.89 ×, 3.36 ×, and 6.24 × compared to the state-of-the-art LLC partition sharing techniques for 2, 4, and 8 cores, respectively. When compared to the original ZCLLC that does not have any optimizations, ZCLLC shows lower analytical WCLs that are 76.5%, 82.6%, and 86.2% lower compared to ZCLLC-NORMAL for 2, 4, and 8 cores, respectively.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-08-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141929131","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
APB-tree: An Adaptive Pre-built Tree Indexing Scheme for NVM-based IoT Systems APB-tree:基于 NVM 的物联网系统的自适应预建树索引方案
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-07-26 DOI: 10.1145/3677179
Shih-Wen Hsu, Yen-Ting Chen, Kam-yiu Lam, Yuan-Hao Chang, W. Shih, Han-Chieh Chao
{"title":"APB-tree: An Adaptive Pre-built Tree Indexing Scheme for NVM-based IoT Systems","authors":"Shih-Wen Hsu, Yen-Ting Chen, Kam-yiu Lam, Yuan-Hao Chang, W. Shih, Han-Chieh Chao","doi":"10.1145/3677179","DOIUrl":"https://doi.org/10.1145/3677179","url":null,"abstract":"\u0000 With the proliferation of sensors and the emergence of novel applications, IoT data has grown exponentially in recent years. Given this trend, efficient data management is crucial for a system to easily access vast amounts of information. For decades, B\u0000 +\u0000 -tree-based indexing schemes have been widely adopted for providing effective search in IoT systems. However, in systems with pre-distributed sensors, B\u0000 +\u0000 -tree-based indexes fail to optimally utilize the known IoT data distribution, leading to significant write overhead and energy consumption. Furthermore, as non-volatile memory (NVM) technology emerges as the alternative storage medium, the inherent write asymmetry of NVM leads to instability issues in IoT systems, especially for write-intensive applications. In this research, by considering the write overheads of tree-based indexing schemes and key-range distribution assumption, we rethink the design of the tree-based indexing schemes and propose an adaptive pre-built tree (APB-tree) indexing scheme to reduce the write overhead in serving insertion and deletion of keys in the NVM-Based IoT system. The APB-tree profiles the hot region of the key distribution from the known key range to pre-allocate the index structure that alleviates online index management costs and run-time index overhead. Meanwhile, the APB-tree maintains the scalability of a tree-based index structure to accommodate the large amount of new data brought by the additional nodes to the IoT system. Extensive experiments demonstrate that our solution achieves significant performance improvements in write operations while maintaining effective energy consumption in the NVM-based IoT system. We compare the energy and time required for basic key operations like Put(), Get(), and Delete() in APB-trees and B\u0000 +\u0000 -tree-based indexing schemes. Under workloads with varying ratios of these operations, the proposed design effectively reduces execution time by 47% to 72% and energy consumption by 11% to 72% compared to B\u0000 +\u0000 -tree-based indexing schemes.\u0000","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141800666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Co-Approximator: Enabling Performance Prediction in Colocated Applications. Co-Approximator:实现同地应用的性能预测。
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-07-25 DOI: 10.1145/3677180
Rafiuzzaman Mohammad, S. Gopalakrishnan, Karthik Pattabiraman
{"title":"Co-Approximator: Enabling Performance Prediction in Colocated Applications.","authors":"Rafiuzzaman Mohammad, S. Gopalakrishnan, Karthik Pattabiraman","doi":"10.1145/3677180","DOIUrl":"https://doi.org/10.1145/3677180","url":null,"abstract":"Today’s Internet of Things (IoT) devices can colocate multiple applications on a platform with hardware resource sharing. Such colocations allow for increasing the throughput of contemporary IoT applications, similar to the use of multi-tenancy in clouds. However, avoiding performance interference among colocated applications through virtualized performance isolation is expensive in IoT platforms due to resource limitations. Hence, on the one hand, colocated IoT applications without performance isolation contend for shared limited resources, which makes their performance variance discontinuous and a priori unknown. On the other hand, different combinations of colocated applications make the overall state space exceedingly large. All of these make such colocated routines challenging to predict, making it difficult to plan which applications to colocate on which platform.\u0000 \u0000 We propose\u0000 Co\u0000 -\u0000 Approximator\u0000 , a technique for systematically sampling an exponentially large colocated application state space and efficiently approximating it from only four available complete colocation samples. We demonstrate the performance of\u0000 Co\u0000 -\u0000 Approximator\u0000 with seventeen standard benchmarks and three pipelined data processing applications on different IoT platforms, where on average,\u0000 Co\u0000 -\u0000 Approximator\u0000 reduces existing techniques’ approximation error from 61% to just 7%.\u0000","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141802265","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Trust Based Active Game Data Collection Scheme in Smart Cities 智能城市中基于信任的主动游戏数据收集方案
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-07-22 DOI: 10.1145/3677319
Zhuoqun Xia, Ziyu Wang, Xiao Liu
{"title":"Trust Based Active Game Data Collection Scheme in Smart Cities","authors":"Zhuoqun Xia, Ziyu Wang, Xiao Liu","doi":"10.1145/3677319","DOIUrl":"https://doi.org/10.1145/3677319","url":null,"abstract":"The concept of a smart city is to equip sensors to various objects in urban life to monitor areas and collect sensing data, and make wise decisions based on the collected data. However, some malicious sensor devices may interrupt and interfere with data collection, leading to a reduction in the integrity and availability of information, thereby causing harm to Internet of Things(IoT) applications. Therefore, identifying the credibility of sensor nodes to ensure the credibility of data collection is a challenge. This paper proposes a trust based active game data collection (TAGDC) scheme to collect trust data in the IoT. This TAGDC scheme mainly includes the following parts: 1)An active trust framework plus evolutionary game theory is proposed to encourage high-energy sensors to send detection routes and quickly obtain sensor trust. 2)In order to balance the data security requirements of subnetworks, the number and frequency of detection routes required by subnetworks are estimated through mechanism modeling and fuzzy analytic hierarchy process. 3)The design focuses on the internal trust computing model in the region to evaluate the trust of nodes. The findings of the experiment demonstrate that the TAGDC scheme, as described in this research study, enhances the accuracy of identifying malicious nodes by 20%, reduces the required identification time by 40%, and improves the data collection success rate by 5%.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141814431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PredATW: Predicting the Asynchronous Time Warp Latency For VR Systems PredATW:预测虚拟现实系统的异步时变延迟
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-07-19 DOI: 10.1145/3677329
Akanksha Dixit, S. Sarangi
{"title":"PredATW: Predicting the Asynchronous Time Warp Latency For VR Systems","authors":"Akanksha Dixit, S. Sarangi","doi":"10.1145/3677329","DOIUrl":"https://doi.org/10.1145/3677329","url":null,"abstract":"With the advent of low-power ultra-fast hardware and GPUs, virtual reality (VR) has gained a lot of prominence in the last few years and is being used in various areas such as education, entertainment, scientific visualization, and computer-aided design. VR-based applications are highly interactive, and one of the most important performance metrics for these applications is the motion-to-photon-delay (MPD). MPD is the delay from the user’s head movement to the time at which the image gets updated on the VR screen. Since the human visual system can even detect an error of a few pixels (very spatially sensitive), the MPD should be as small as possible.\u0000 \u0000 Popular VR vendors use the GPU-accelerated Asynchronous Time Warp (ATW) algorithm to reduce the MPD. ATW reduces the MPD if and only if the warping operation finishes just before the display refreshes. However, due to the competition between the different constituent applications for the single, shared GPU, the GPU-accelerated ATW algorithm suffers from an unpredictable ATW latency, making it challenging to find the ideal time instance for starting the time warp and ensuring that it completes with the least amount of lag relative to the screen refresh. Hence, the state-of-the-art is to use a separate hardware unit for the time warping operation. Our approach,\u0000 PredATW\u0000 , uses an ML-based hardware predictor to predict the ATW latency for a VR application, and then schedule it as late as possible while running the time warping operation on the GPU itself. This is the first work to do so. Our predictor achieves an error of only 0.22 ms across several popular VR applications for predicting the ATW latency. As compared to the baseline architecture, we reduce deadline misses by 80.6%.\u0000","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141822324","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lightweight Champions of the World: Side-Channel Resistant Open Hardware for Finalists in the NIST Lightweight Cryptography Standardization Process 轻量级世界冠军:NIST 轻量级密码标准化进程入围者的抗侧信道开放硬件
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-07-17 DOI: 10.1145/3677320
Kamyar Mohajerani, Luke Beckwith, Abubakr Abdulgadir, J. Kaps, K. Gaj
{"title":"Lightweight Champions of the World: Side-Channel Resistant Open Hardware for Finalists in the NIST Lightweight Cryptography Standardization Process","authors":"Kamyar Mohajerani, Luke Beckwith, Abubakr Abdulgadir, J. Kaps, K. Gaj","doi":"10.1145/3677320","DOIUrl":"https://doi.org/10.1145/3677320","url":null,"abstract":"Cryptographic competitions played a significant role in stimulating the development and release of open hardware for cryptography. The primary reason was the focus of standardization organizations and other contest organizers on transparency and fairness of hardware benchmarking, which could be achieved only with all source code made available for public scrutiny. Consequently, the number and quality of open-source hardware implementations developed during subsequent major competitions, such as AES, SHA-3, and CAESAR, have steadily increased. However, most of these implementations were still quite far from being used in future products due to the lack of countermeasures against side-channel analysis (SCA). In this paper, we discuss the first coordinated effort at developing SCA-resistant open hardware for all finalists of a cryptographic standardization process. The developed hardware is then evaluated by independent labs for information leakage and resilience to selected attacks. Our target included the ten finalists of the NIST Lightweight Cryptography Standardization Process. The authors’ contributions included formulating detailed requirements, publicizing the submissions, matching open hardware with suitable SCA-evaluation labs, developing a subset of all implementations, serving as one of the six evaluation labs, performing FPGA benchmarking of all protected and unprotected implementations, and summarizing results in the comprehensive report. Our results confirm that NIST made the right decision in selecting Ascon as a future lightweight cryptography standard. They also indicate that at least three other algorithms, Xoodyak, TinyJAMBU, and ISAP, were very strong competitors and outperformed Ascon in at least one of the evaluated performance metrics.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141830078","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Performance and Communication Cost of Hardware Accelerators for Hashing in Post-Quantum Cryptography 后量子密码学哈希算法硬件加速器的性能和通信成本
IF 2.8 3区 计算机科学
ACM Transactions on Embedded Computing Systems Pub Date : 2024-07-09 DOI: 10.1145/3676965
Patrick Karl, Jonas Schupp, Georg Sigl
{"title":"Performance and Communication Cost of Hardware Accelerators for Hashing in Post-Quantum Cryptography","authors":"Patrick Karl, Jonas Schupp, Georg Sigl","doi":"10.1145/3676965","DOIUrl":"https://doi.org/10.1145/3676965","url":null,"abstract":"SPHINCS+ is a signature scheme included in the first NIST post-quantum standard, that bases its security on the underlying hash primitive. As most of the runtime of SPHINCS+ is caused by the evaluation of several hash- and pseudo-random functions, offloading this computation to dedicated hardware accelerators is a natural step. In this work, we evaluate different architectures for hardware acceleration of such a hash primitive with respect to its use-case and evaluate them in the context of SPHINCS+. We attach hardware accelerators for different hash primitives (SHAKE128 and Ascon-Xof for both, full and round-reduced versions) to CPU interfaces having different transfer speeds. We show, that for most use-cases, data transfer determines the overall performance if accelerators are equipped with FIFOs and that reducing the number of rounds in the permutation does not necessarily lead to significant performance improvements when using hardware acceleration.\u0000 This work extends on a conference paper accepted at COSADE’24, first published in [19], and written by the same authors, where different architectures for hardware accelerators of hash functions are benchmarked and evaluated for SPHINCS+ as a case study. In this paper, we provide results for additional parameter sets for SPHINCS+ and improve the performance of one of the accelerators by adding an additional RISC-V instruction for faster absorption. We then extend the performance benchmark by including the algorithms CRYSTALS-Kyber, CRYSTALS-Dilithium and Falcon. Finally we provide a power/energy comparison for the accelerators.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8,"publicationDate":"2024-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141663609","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信