Journal of Systems Architecture最新文献

筛选
英文 中文
PFV2: Packet fragmentation with variable size and vigorous mapping in time-sensitive networking PFV2:时间敏感网络中可变大小和动态映射的数据包碎片
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-29 DOI: 10.1016/j.sysarc.2025.103457
Wenyan Yan , Bin Fu , Dongsheng Wei , Renfa Li , Yixue Lei , Yuhang Jia , Guoqi Xie
{"title":"PFV2: Packet fragmentation with variable size and vigorous mapping in time-sensitive networking","authors":"Wenyan Yan ,&nbsp;Bin Fu ,&nbsp;Dongsheng Wei ,&nbsp;Renfa Li ,&nbsp;Yixue Lei ,&nbsp;Yuhang Jia ,&nbsp;Guoqi Xie","doi":"10.1016/j.sysarc.2025.103457","DOIUrl":"10.1016/j.sysarc.2025.103457","url":null,"abstract":"<div><div>With the rapid advancement of intelligent automobiles, the ever-growing communication data put forward high bandwidth and low latency requirements. To meet these requirements, automotive Original Equipment Manufacturers (OEMs) widely adopt domain-centralized Electrical/Electronic (E/E) architecture. In this architecture, Time-Sensitive Networking (TSN) is expected to serve as the backbone network because of its high bandwidth and deterministic communication. TSN uses the Gate Control List (GCL) to divide its time length into multiple time slots, and the time intervals of these time slots are non-uniform (equal or unequal). Different flows have varying requirements for time slot sizes. To enhance the acceptance ratio of Time-Triggered (TT) flows through GCL time slot allocation, packet fragmentation (i.e., flow fragmentation) is introduced into TSN in the recent study. The state-of-the-art packet fragmentation solution divides one un-schedulable TT flow into multiple equal-sized packets (i.e., equal-sized time slots). In other words, these fixed-size packets are difficult to be mapped into the time slots with different sizes.</div><div>This study develops a Packet Fragmentation with Variable-size and Vigorous-mapping (PFV2) technique based on the following three innovations: (1) we implement variable-size packet fragmentation, which iteratively divides the un-schedulable TT flow into smaller packets and then dynamically reschedules these packets; (2) we implement the vigorous mapping solution from packets to time slots by deeply searching for available time slots within the flow’s deadline; and (3) we verify PFV2 based on the LS1028A with Cortex-A72 (i.e., the NXP automotive-grade development board). PFV2 improves the acceptance ratio by up to 20.18% and bandwidth utilization by up to 7.024% compared with the state-of-the-art solution. The theoretical and practical co-verification experiments demonstrate that the PFV2 can effectively improve the flow acceptance ratio and outperform the state-of-the-art solution.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103457"},"PeriodicalIF":3.7,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144184518","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient registration-based signature schemes 高效的注册签名方案
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-29 DOI: 10.1016/j.sysarc.2025.103458
Zhenhua Liu , Yujing Yuan , Zhiqing Chen , Yu Han , Baocang Wang
{"title":"Efficient registration-based signature schemes","authors":"Zhenhua Liu ,&nbsp;Yujing Yuan ,&nbsp;Zhiqing Chen ,&nbsp;Yu Han ,&nbsp;Baocang Wang","doi":"10.1016/j.sysarc.2025.103458","DOIUrl":"10.1016/j.sysarc.2025.103458","url":null,"abstract":"<div><div>Registration-based cryptography is a newly emerging variant of the public-key cryptosystem that addresses the key-escrow issue inherent in identity-based encryption. However, the key-escrow problem in identity-based signature remains to be solved. In this work, we present a basic registration-based signature (RBS) scheme, which removes the trusted third-party that generates secret keys in identity-based signature. In the proposed basic RBS scheme, a trie tree is used to expand the identity space for supporting large space of users’ identities (e.g., arbitrary strings), and vector commitment is utilized to bind public key to user’s identity and accumulate the registered public keys that enable the verification. The basic scheme is proven to be EUF-CMA based on the CDH assumption in the random oracle model. What is more, we provide two improvements for the basic scheme, including an adaptive RBS scheme and an updatable RBS scheme. Finally, the simulated implementation shows the proposed basic RBS scheme is practical and viable on a scale encompassing millions of users.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103458"},"PeriodicalIF":3.7,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144184519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BQProfit: Budget and QoS aware task offloading and resource allocation for Profit maximization under MEC platform BQProfit:基于预算和QoS的MEC平台下利润最大化的任务卸载和资源分配
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-28 DOI: 10.1016/j.sysarc.2025.103447
Akhirul Islam, Manojit Ghose
{"title":"BQProfit: Budget and QoS aware task offloading and resource allocation for Profit maximization under MEC platform","authors":"Akhirul Islam,&nbsp;Manojit Ghose","doi":"10.1016/j.sysarc.2025.103447","DOIUrl":"10.1016/j.sysarc.2025.103447","url":null,"abstract":"<div><div>In the era of Multi-access Edge Computing (MEC), efficient partitioning of applications and, thereafter, allocation of edge resources to the tasks of the applications is crucial for maximizing the profit of the service provider. Although a few studies have been in recent times, they overlooked many essential parameters such as the dependency on tasks, deadline and demand-based dynamic charging plan, cooperation among edge servers, etc. This paper proposes a novel strategy named <em>BQProfit</em> that has two essential components: a Modified Kernighan–Lin (MKL) task offloading approach and a metaheuristic-based genetic algorithm known as Cost Optimized Resource Allocation (CORA). To evaluate the effectiveness of the proposed strategy, we perform an extensive simulation using synthetic and scientific workflow data sets and compare the result with a baseline policy and two state-of-the-art policies. On average, BQProfit increases the service provider’s profit by 4.1x compared to the benchmark policies and by 1.5x compared to the best-performing benchmark policy. Our strategy also outperforms these policies by reducing task failure rates by 34.06% on average, and 25% compared to the best-performing benchmark policy. Additionally, BQProfit shows an average improvement of 70.83% over the state-of-the-art and 44% over the top-performing benchmark policy.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103447"},"PeriodicalIF":3.7,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144170017","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An FPGA-based bit-level weight sparsity and mixed-bit accelerator for neural networks 基于fpga的神经网络位级权重稀疏性和混合位加速器
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-28 DOI: 10.1016/j.sysarc.2025.103463
Xianghong Hu , Shansen Fu , Yuanmiao Lin , Xueming Li , Chaoming Yang , Rongfeng Li , Hongmin Huang , Shuting Cai , Xiaoming Xiong
{"title":"An FPGA-based bit-level weight sparsity and mixed-bit accelerator for neural networks","authors":"Xianghong Hu ,&nbsp;Shansen Fu ,&nbsp;Yuanmiao Lin ,&nbsp;Xueming Li ,&nbsp;Chaoming Yang ,&nbsp;Rongfeng Li ,&nbsp;Hongmin Huang ,&nbsp;Shuting Cai ,&nbsp;Xiaoming Xiong","doi":"10.1016/j.sysarc.2025.103463","DOIUrl":"10.1016/j.sysarc.2025.103463","url":null,"abstract":"<div><div>Bit-level weight sparsity and mixed-bit quantization are regarded as effective methods to improve the computing efficiency of convolutional neural network (CNN) accelerators. However, irregular sparse matrices will greatly increase the index overhead and hardware resource consumption. Moreover, bit-serial computing (BSC) is usually adopted to implement bit-level weight sparsity on accelerators, and the traditional BSC leads to uneven utilization of DSP and LUT resources on the FPGA platform, thereby limiting the improvement of the overall performance of the accelerator. Therefore, in this work, we present an accelerator designed for bit-level weight sparsity and mixed-bit quantization. We first introduce a non-linear quantization algorithm named bit-level sparsity learned quantizer (BSLQ), which can maintain high accuracy during mixed quantization and guide the accelerator to complete bit-level weight sparse computations using DSP. Based on this algorithm, we implement the multi-channel bit-level sparsity (MCBS) method to mitigate irregularities and reduce the index count associated with bit-level sparsity. Finally, we propose a sparse weight arbitrary basis scratch pad (SWAB SPad) method that enables retrieval of compressed weights without fetching activations, which can save 30.52% of LUTs and 64.02% of FFs. Experimental results demonstrate that when quantizing ResNet50 and VGG16 using 4/8 bits, our approach achieves accuracy that is comparable to or even better than 32-bit (75.98% and 73.70% for the two models). Compared to the state-of-the-art FPGA-based accelerators, this accelerator achieves up to 5.36 times DSP efficiency improvement and provides 8.87 times energy efficiency improvement.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103463"},"PeriodicalIF":3.7,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144190533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
USEFUSE: Uniform stride for enhanced performance in fused layer architecture of deep neural networks USEFUSE:提高深度神经网络融合层架构性能的统一跨步
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-27 DOI: 10.1016/j.sysarc.2025.103459
Muhammad Sohail Ibrahim , Muhammad Usman , Jeong-A Lee
{"title":"USEFUSE: Uniform stride for enhanced performance in fused layer architecture of deep neural networks","authors":"Muhammad Sohail Ibrahim ,&nbsp;Muhammad Usman ,&nbsp;Jeong-A Lee","doi":"10.1016/j.sysarc.2025.103459","DOIUrl":"10.1016/j.sysarc.2025.103459","url":null,"abstract":"<div><div>Convolutional Neural Networks (CNNs) are crucial in various applications, but deploying them on resource-constrained edge devices poses challenges. This study presents the Sum-of-Products (SOP) units for convolution, which utilize low-latency left-to-right bit-serial arithmetic to minimize response time and enhance overall performance. The study proposes a methodology for fusing multiple convolution layers to reduce off-chip memory communication and increase the overall performance. An effective mechanism detects and skips inefficient convolutions after ReLU layers, minimizing power consumption without compromising accuracy. Additionally, efficient tile movement guarantees uniform access to the fusion pyramid. An analysis demonstrates the uniform stride strategy improves operational intensity. Two designs cater to varied demands: one focuses on minimal response time for mission-critical applications, and another focuses on resource-constrained devices with comparable latency. This approach notably reduced redundant computations, improving the efficiency of CNN deployment on edge devices.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103459"},"PeriodicalIF":3.7,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-client functional encryption for set intersection with non-monotonic access structures in federated learning 联邦学习中非单调访问结构集合交集的多客户端功能加密
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-26 DOI: 10.1016/j.sysarc.2025.103421
Ruyuan Zhang , Jinguang Han , Liqun Chen , Yiheng Wei
{"title":"Multi-client functional encryption for set intersection with non-monotonic access structures in federated learning","authors":"Ruyuan Zhang ,&nbsp;Jinguang Han ,&nbsp;Liqun Chen ,&nbsp;Yiheng Wei","doi":"10.1016/j.sysarc.2025.103421","DOIUrl":"10.1016/j.sysarc.2025.103421","url":null,"abstract":"<div><div>Federated learning (FL) based on cloud servers is a distributed machine learning framework which allows an aggregator and multiple clients to train collaboratively a shared model without exchanging data. Considering the confidentiality of training data, several schemes employing functional encryption (FE) have been presented. However, existing schemes cannot express complex access control policies. In this paper, to realize more flexible and fine-grained access control, we propose a multi-client functional encryption scheme for set intersection with non-monotonic access structures (MCFE-SI-NAS), where multiple clients encrypt their private data independently without any interaction. All ciphertexts are associated with a tag, which can resist “mix-and-match” attacks. Aggregator can aggregate ciphertexts and output the set intersection of any two clients’ plaintexts, but cannot learn anything else. We first formalize the definition and security model for the MCFE-SI-NAS scheme and build a concrete construction based on asymmetric prime-order pairings. The security of the designed scheme is formally proven. Furthermore, we implement our MCFE-SI-NAS scheme and provide its efficiency analysis.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103421"},"PeriodicalIF":3.7,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144147870","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Transferable universal adversarial attack against 3D object detection with latent feature disruption 具有潜在特征干扰的三维目标检测的可转移通用对抗性攻击
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-22 DOI: 10.1016/j.sysarc.2025.103446
Mumuxin Cai , Xupeng Wang , Ferdous Sohel , Dian Xiao , Hang Lei
{"title":"Transferable universal adversarial attack against 3D object detection with latent feature disruption","authors":"Mumuxin Cai ,&nbsp;Xupeng Wang ,&nbsp;Ferdous Sohel ,&nbsp;Dian Xiao ,&nbsp;Hang Lei","doi":"10.1016/j.sysarc.2025.103446","DOIUrl":"10.1016/j.sysarc.2025.103446","url":null,"abstract":"<div><div>3D object detection models are highly vulnerable to adversarial attacks, which expose their weaknesses and when addressed, help improve the robustness of the models. Existing adversarial attack methods against LiDAR scene are typically optimized for a single sample and perform poorly in terms of transferability. Adversarial attacks with universal and transferable abilities can bring further guidelines for the robustness study of 3D object detection. In this paper, we propose a universal adversarial perturbation attack against 3D object detection models, which suppresses the detection results and disrupts the latent features simultaneously. Specifically, the universal adversarial perturbation is generated to launch sample-agnostic attacks, which is encoded in elaborate perturbation voxel units and is adaptive to varying scales of LiDAR scenes, as well as 3D object detectors with different point cloud representations. The proposed transferable attack focuses on the latent feature space and deviates the detectors at outputs of shallow layers. Moreover, a layer activation loss function is designed, which suppresses the significant features extracted by the backbone network. Extensive experiments on multiple popular 3D object detectors and large-scale datasets demonstrate that the proposed method achieves superior attack success rates, exposing critical robustness issues in current LiDAR-based 3D object detection models.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103446"},"PeriodicalIF":3.7,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144134701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
TMI: Two-dimensional maintainability index for automotive software maintainability measurement 面向汽车软件可维护性度量的二维可维护性指标
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-21 DOI: 10.1016/j.sysarc.2025.103425
Jiao Xie , Jialing Yang , Sirong Zhao , Jianmei Lei , Kaiwei Fan , Guoqi Xie
{"title":"TMI: Two-dimensional maintainability index for automotive software maintainability measurement","authors":"Jiao Xie ,&nbsp;Jialing Yang ,&nbsp;Sirong Zhao ,&nbsp;Jianmei Lei ,&nbsp;Kaiwei Fan ,&nbsp;Guoqi Xie","doi":"10.1016/j.sysarc.2025.103425","DOIUrl":"10.1016/j.sysarc.2025.103425","url":null,"abstract":"<div><div>The mass adoption of autonomous driving and Advanced Driver-Assistance System (ADAS) technologies increases the complexity and maintenance cost of automotive-embedded software during model-based development. Through model-based development tools (e.g., MATLAB/Simulink), the code is automatically generated by architectural transformation; however, unreasonable software architecture design could make the final generated code be difficult to maintain. Currently, both Microsoft Maintainability Index (MMI) and Bosch Maintainability Index (BMI) are used to measure the maintainability of code, but neither software architecture. We establish a four-layer maintainability index model by analyzing the characteristics of automotive software architecture, software quality model, and industry-related standards. We then propose a Two-dimensional Maintainability Index (TMI), meaning the code and architecture dimensions are simultaneously considered. This is the first work of maintainability measurement to combine code and architecture dimensions while ensuring low redundancy. TMI utilizes nine index metrics to guide architecture refactoring and code optimization, significantly improving software maintainability.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103425"},"PeriodicalIF":3.7,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144123237","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Set associative address mapping to improve data throughput and reduce tail latency in SSDs 通过设置关联地址映射,可以提高数据吞吐量,减少ssd盘的尾部时延
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-21 DOI: 10.1016/j.sysarc.2025.103445
Aobo Yang, Jiaojiao Wu, Jiaxu Wu, Fan Yang, Zhibing Sha, Shiyu Zhong, Zhigang Cai, Jianwei Liao
{"title":"Set associative address mapping to improve data throughput and reduce tail latency in SSDs","authors":"Aobo Yang,&nbsp;Jiaojiao Wu,&nbsp;Jiaxu Wu,&nbsp;Fan Yang,&nbsp;Zhibing Sha,&nbsp;Shiyu Zhong,&nbsp;Zhigang Cai,&nbsp;Jianwei Liao","doi":"10.1016/j.sysarc.2025.103445","DOIUrl":"10.1016/j.sysarc.2025.103445","url":null,"abstract":"<div><div>Solid State Drives (SSDs) have become the mainstream storage infrastructure across diverse computing systems. To access the data on the flash memory, a software component called Flash Translation Layer (FTL) is used to convert the logical address of an I/O request into the corresponding physical address, at the granularity of a page. This process is referred to as page-level address mapping in SSDs. An effective mapping method should fully utilize internal parallelism to maximize I/O throughput of the SSD device, while also paying attention to the long tail latency for guaranteeing user experience. Existing mapping approaches, however, have yet to effectively address both aspects simultaneously. Therefore, this paper proposes a <strong><u>s</u></strong>et <strong><u>a</u></strong>ssociative <strong><u>map</u></strong>ping approach, called <em><strong>SAMap</strong></em> to direct data allocation on the basis of static mapping, to improve data throughput and reducing long tail latency. Specifically, <em>SAMap</em> manages a number of channels into the granularity of <strong>set</strong>, and enables set associative mapping for data allocation. In the case of a write request being mapped to a specific channel by following the policy of static mapping, <em>SAMap</em> can forward it to any channel in the same set, by considering I/O workload balance across channels. Trace-driven experiments show that our proposal can enhance I/O data throughput by <span>36.0</span>% on average and cut down the tail latency by between <span>28.1</span>% and <span>57.0</span>%, at the <em>99.99th</em> percentile, in contrast to existing approaches.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103445"},"PeriodicalIF":3.7,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144138043","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Hamun: An approximate computing method to prolong the lifespan of ReRAM-based accelerators Hamun:一种近似计算方法来延长基于reram的加速器的寿命
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-05-20 DOI: 10.1016/j.sysarc.2025.103444
Mohammad Sabri, Marc Riera, Antonio González
{"title":"Hamun: An approximate computing method to prolong the lifespan of ReRAM-based accelerators","authors":"Mohammad Sabri,&nbsp;Marc Riera,&nbsp;Antonio González","doi":"10.1016/j.sysarc.2025.103444","DOIUrl":"10.1016/j.sysarc.2025.103444","url":null,"abstract":"<div><div>ReRAM-based accelerators exhibit enormous potential to increase computational efficiency for DNN inference tasks, delivering significant performance and energy savings over traditional platforms. By incorporating adaptive scheduling, these accelerators dynamically adjust to DNN requirements, optimizing allocation of constrained hardware resources. However, ReRAM cells have limited endurance cycles due to wear-out from multiple updates for each inference execution, which shortens the lifespan of ReRAM-based accelerators and presents a practical challenge in positioning them as alternatives to conventional platforms like TPUs. Addressing these endurance limitations is essential for making ReRAM-based solutions viable for long-term, high-performance DNN inference.</div><div>To address the lifespan limitations of ReRAM-based accelerators, we introduce <em>Hamun</em>, an approximate computing method designed to extend the lifespan of ReRAM-based accelerators through a range of optimizations. Hamun incorporates a novel mechanism that detects faulty cells due to wear-out and retires them, avoiding in this way their otherwise adverse impact on DNN accuracy. Moreover, Hamun extends the lifespan of ReRAM-based accelerators by adapting wear-leveling techniques across various abstraction levels of the accelerator and implementing a batch execution scheme to maximize ReRAM cell usage for multiple inferences. Additionally, Hamun introduces a new approximation method that leverages the fault tolerance characteristics of DNNs to delay the retirement of worn-out cells, reducing the performance penalty of retired cells and further extending the accelerator’s lifespan. On average, evaluated on a set of popular DNNs, Hamun demonstrates an improvement in lifespan of <span><math><mrow><mn>13</mn><mo>.</mo><mn>2</mn><mo>×</mo></mrow></math></span> over a state-of-the-art baseline. The main contributors to this improvement are the fault handling and batch execution schemes, which provide <span><math><mrow><mn>4</mn><mo>.</mo><mn>6</mn><mo>×</mo></mrow></math></span> and <span><math><mrow><mn>2</mn><mo>.</mo><mn>6</mn><mo>×</mo></mrow></math></span> lifespan improvements respectively.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"166 ","pages":"Article 103444"},"PeriodicalIF":3.7,"publicationDate":"2025-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144138042","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信