Journal of Systems Architecture最新文献

筛选
英文 中文
PaLLOC: Pairwise-based low-latency online coordinated resource manager of last-level cache and memory bandwidth on multicore systems PaLLOC:多核系统上基于成对的低延迟在线协调资源管理器,用于管理最后一级缓存和内存带宽
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-30 DOI: 10.1016/j.sysarc.2025.103427
Yang Bai, Yizhi Huang, Si Chen, Renfa Li
{"title":"PaLLOC: Pairwise-based low-latency online coordinated resource manager of last-level cache and memory bandwidth on multicore systems","authors":"Yang Bai,&nbsp;Yizhi Huang,&nbsp;Si Chen,&nbsp;Renfa Li","doi":"10.1016/j.sysarc.2025.103427","DOIUrl":"10.1016/j.sysarc.2025.103427","url":null,"abstract":"<div><div>Modern advanced multicore CPUs integrate large last-level caches (LLC) and provide high memory bandwidth, which are generally shared among cores. In many scenarios, isolated resources are required among co-running applications with dynamic changes. This drives the need for online partitioning of these shared hardware resources to accommodate applications’ different and varying resource demands. However, dynamically managing LLC and memory bandwidth without prior knowledge faces numerous searches for resource configurations to gather sufficient information and find the partition solution, which may cause long management latency and limit system performance. To address this problem, we first identify several workload-independent observations and insights through a comprehensive exploration of the configuration space across various benchmarks, which can help reduce the need for configuration searches greatly. Guided by these findings, we propose a method that integrates two-step allocation with pairwise search techniques to maximize system instructions per cycle (IPC) throughput. Building on this method, we design and implement PaLLOC, a novel low-latency online coordinated resource manager of LLC and memory bandwidth on multicore systems. Comprehensive evaluations on an Intel commodity server demonstrate that PaLLOC consistently exhibits significant performance advantage across various system workloads with diverse resource requirements, achieving 1.14x-1.47x speedup in system IPC throughput over the state-of-the-art online partitioning method, with a management latency of approximately 300ms under a monitoring period of 10 ms.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103427"},"PeriodicalIF":3.7,"publicationDate":"2025-04-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143902588","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Efficient and provably secure privacy-preserving two-factor authentication and key-agreement using blockchain and TEE for IoV environments 在IoV环境中使用区块链和TEE的高效且可证明安全的保护隐私的双因素身份验证和密钥协议
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-25 DOI: 10.1016/j.sysarc.2025.103422
Qihang Hou , Chingfang Hsu , Man Ho Au , Honglang Hu , Zhuo Zhao , Zeyu Wu
{"title":"Efficient and provably secure privacy-preserving two-factor authentication and key-agreement using blockchain and TEE for IoV environments","authors":"Qihang Hou ,&nbsp;Chingfang Hsu ,&nbsp;Man Ho Au ,&nbsp;Honglang Hu ,&nbsp;Zhuo Zhao ,&nbsp;Zeyu Wu","doi":"10.1016/j.sysarc.2025.103422","DOIUrl":"10.1016/j.sysarc.2025.103422","url":null,"abstract":"<div><div>With the rapid advancement of wireless communication and cloud computing technologies, the Internet of Vehicles (IoV), which enables information sharing between vehicles, cloud servers, and infrastructure to support intelligent driving and road safety functionalities, has seen widespread adoption. However, transmitting information through public channels in IoV introduces significant privacy and security risks. For example, vehicle location trajectories and road users’ identity information are vulnerable to leakage, and the communication process may be subject to various forms of attacks. To address these issues, extensive research has focused on authentication and key agreement (AKA) protocols for intelligent vehicles and cloud servers in IoV. However, existing solutions have several drawbacks, including excessive reliance on third-party entities, such as registration authorities, high computational overhead, inadequate security features, and multiple interactions, all of which fail to meet the resource constraints and real-time communication requirements of IoV. To overcome these limitations, this paper introduces, for the first time, the Trusted Execution Environment (TEE) into IoV authentication and proposes an efficient and provably secure privacy-preserving two-factor authentication and key agreement scheme based on blockchain and TEE, called BPAKA. Compared to existing methods, BPAKA offers several significant improvements: First, it leverages TEE to eliminate the need for trusted third parties, enabling mutual anonymous authentication between intelligent vehicles and cloud servers in IoV scenarios, while ensuring a comprehensive set of security properties. Second, BPAKA incorporates a blockchain-based data-sharing framework, ensuring lightweight computational overhead. The AKA process requires only a single round of interaction, thereby fulfilling IoV’s real-time requirements and mitigating the impact of network fluctuations. Third, the security of BPAKA is formally proven using provable security techniques, demonstrating its robustness against various potential threats. Furthermore, performance evaluations show that, compared to existing IoV schemes, BPAKA achieves lower overall computational overhead. In addition, when compared with the state-of-the-art TEE-based scheme, the computation overhead in TEE of BPAKA is only 50.1% of that of the latter, while maintaining feasible communication and storage costs.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103422"},"PeriodicalIF":3.7,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143894400","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey on routing algorithm and router microarchitecture of three-dimensional Network-on-Chip 三维片上网络路由算法及路由器微结构研究综述
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-25 DOI: 10.1016/j.sysarc.2025.103429
Yuan Zhang , Zewei Jing , Qinghai Yang , Nan Cheng , Huaxi Gu , Kyung Sup Kwak
{"title":"A survey on routing algorithm and router microarchitecture of three-dimensional Network-on-Chip","authors":"Yuan Zhang ,&nbsp;Zewei Jing ,&nbsp;Qinghai Yang ,&nbsp;Nan Cheng ,&nbsp;Huaxi Gu ,&nbsp;Kyung Sup Kwak","doi":"10.1016/j.sysarc.2025.103429","DOIUrl":"10.1016/j.sysarc.2025.103429","url":null,"abstract":"<div><div>The continuous advancement of modern integrated circuits has facilitated the emergence of three-dimensional Networks-on-Chip (3D NoC), characterized by their direct vertical inter-layer electrical connections, which significantly enhance interconnect density. The performance and efficiency of 3D NoC architectures are jointly influenced by routing algorithms and router microarchitectures, which exhibit a symbiotic and complementary relationship. Routing algorithms are instrumental in determining the pathways for data packet transmission, profoundly influencing network latency, throughput, and reliability. Meanwhile, the router executes these algorithms, optimizing overall system efficiency through judicious resource allocation and effective data processing management. In this survey, we categorize routing algorithms according to various criteria, providing a detailed analysis of oblivious, adaptive, and hybrid oblivious-adaptive algorithms based on their degrees of adaptivity. Furthermore, we examine router microarchitectures, classifying them into buffered, bufferless, and hybrid buffered-bufferless designs, depending on whether buffering mechanisms are employed. This survey offers a comprehensive analysis of the co-evolution and co-design of routing algorithms and router microarchitectures, emphasizing that the alignment of an optimal routing algorithm with the appropriate microarchitecture is critical for better 3D NoC performance.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103429"},"PeriodicalIF":3.7,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143894398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The online reconfiguration of a distributed on-board computer: The time and network behaviour of a dependable scheduling algorithm 分布式机载计算机的在线重构:一种可靠调度算法的时间和网络行为
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-24 DOI: 10.1016/j.sysarc.2025.103420
Glen te Hofsté , Andreas Lund , Alexandra Coroiu , Marco Ottavi , Daniel Lüdtke
{"title":"The online reconfiguration of a distributed on-board computer: The time and network behaviour of a dependable scheduling algorithm","authors":"Glen te Hofsté ,&nbsp;Andreas Lund ,&nbsp;Alexandra Coroiu ,&nbsp;Marco Ottavi ,&nbsp;Daniel Lüdtke","doi":"10.1016/j.sysarc.2025.103420","DOIUrl":"10.1016/j.sysarc.2025.103420","url":null,"abstract":"<div><div>On-board Computers (OBCs) are at the centre of space-faring systems. With the increasing demand for cost-effective computing power in space, using high-performance commercial-off-the-shelf (COTS) components for OBCs has gained significant traction. COTS components, however, do not provide the necessary fault tolerance mechanisms. The ScOSA (Scalable On-board computing for Space Avionics) architecture uses COTS components in a distributed system to provide more computing performance and dependability. The effects of node failures are mitigated by removing the failed node from the system through reconfiguration. A reconfiguration is performed by using a set of predetermined configurations, which hinders system scalability due to exponentially increasing memory consumption depending on the number of nodes.</div><div>This paper continues the work on the ScOSA online reconfiguration algorithm as a solution to this scalability problem. The online reconfiguration algorithm, which has been integrated into a scheduler, makes task scheduling decisions at run-time, eliminating the need for predetermined configurations. The six-phase scheduling mechanism uses the real-time state of the system and is a step towards higher dependability in distributed on-board computing. New test scenarios have been introduced to provide insight into the temporal and network behaviour of online reconfiguration. By evaluating in terms of <em>time</em>, <em>network traffic</em> and <em>memory usage</em>, it is shown that online reconfiguration is not only capable of dynamically generating configurations but also providing a solution to the scalability problem for systems with varying numbers of both nodes and tasks.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103420"},"PeriodicalIF":3.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143878515","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
μScope: Evaluating storage stack robustness against SSD’s latency variation μScope:评估存储堆栈对SSD延迟变化的鲁棒性
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-24 DOI: 10.1016/j.sysarc.2025.103405
Linxiao Bai , Shanshan Li , Zhouyang Jia , Yu Jiang , Yuanliang Zhang , Zichen Xu , Bin Lin , Si Zheng , Xiangke Liao
{"title":"μScope: Evaluating storage stack robustness against SSD’s latency variation","authors":"Linxiao Bai ,&nbsp;Shanshan Li ,&nbsp;Zhouyang Jia ,&nbsp;Yu Jiang ,&nbsp;Yuanliang Zhang ,&nbsp;Zichen Xu ,&nbsp;Bin Lin ,&nbsp;Si Zheng ,&nbsp;Xiangke Liao","doi":"10.1016/j.sysarc.2025.103405","DOIUrl":"10.1016/j.sysarc.2025.103405","url":null,"abstract":"<div><div>The rapid development of Solid State Disks (SSDs) drastically reduces device latency from <span><math><mrow><mn>100</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span> to around <span><math><mrow><mn>10</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span>. However, performance advertised is not always performance delivered. Background operations (e.g., garbage collection and wear leveling) inside the SSDs now may severely influence the performance. In addition, SSDs are also susceptible to fail-slow failures. Traditionally, studying SSD-based stack focuses on understanding the SSD internal behaviors or discussing the impacts of software stack on throughput.</div><div>In this paper, we conduct an extensive study on software stack atop the low-latency SSDs, especially under device latency variations. We build <span><math><mi>μ</mi></math></span>Scope to overcome two major challenges, including achieving fine-grained latency injection and low-overhead monitoring, in profiling. Via <span><math><mi>μ</mi></math></span>Scope, we manage to obtain three major lessons in access patterns, consistency trade-offs and consecutive performance variations which shall benefit developers for further optimizations.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103405"},"PeriodicalIF":3.7,"publicationDate":"2025-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143868277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Machine Learning-Based Intrusion Detection Framework with Labeled Dataset Generation for IEEE 802.1 Time-Sensitive Networking IEEE 802.1时间敏感网络中基于机器学习的标记数据集生成入侵检测框架
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-16 DOI: 10.1016/j.sysarc.2025.103408
Mustafa Topsakal , Selçuk Cevher , Doğanalp Ergenç
{"title":"A Machine Learning-Based Intrusion Detection Framework with Labeled Dataset Generation for IEEE 802.1 Time-Sensitive Networking","authors":"Mustafa Topsakal ,&nbsp;Selçuk Cevher ,&nbsp;Doğanalp Ergenç","doi":"10.1016/j.sysarc.2025.103408","DOIUrl":"10.1016/j.sysarc.2025.103408","url":null,"abstract":"<div><div>IEEE 802.1 Time-Sensitive Networking (TSN) technology has been increasingly embraced in mission-critical systems to establish deterministic communication with bounded latency. Since safety and security are of prime importance in such systems, the protection of TSN protocols has also been elevated to one of the highest priorities. In this work, we present a machine learning (ML)-based intrusion detection framework against low-rate denial of service (LDoS) attacks on TSN-based platforms. In LDoS attacks, the message period of victim streams are subtly manipulated, that makes their detection more challenging. Addressing this challenge, we evaluate and compare several ML algorithms within our framework in terms of their attack detection performance and computational cost. We also explore two different mitigation strategies to alleviate the effects of data imbalance, which is imposed by the nature of LDoS. To the best of our knowledge, our work is the first in the literature by presenting an ML-based intrusion detection framework and a TSN dataset that contains simulated LDoS attacks targeting a TSN-based in-vehicle network.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"164 ","pages":"Article 103408"},"PeriodicalIF":3.7,"publicationDate":"2025-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143874030","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing both performance and tail latency for B+tree on persistent memory 在持久内存上优化B+树的性能和尾部延迟
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-15 DOI: 10.1016/j.sysarc.2025.103406
Xianyu He , Chaoshu Yang , Runyu Zhang , Huizhang Luo , Zhichao Cao , Jeff Zhang
{"title":"Optimizing both performance and tail latency for B+tree on persistent memory","authors":"Xianyu He ,&nbsp;Chaoshu Yang ,&nbsp;Runyu Zhang ,&nbsp;Huizhang Luo ,&nbsp;Zhichao Cao ,&nbsp;Jeff Zhang","doi":"10.1016/j.sysarc.2025.103406","DOIUrl":"10.1016/j.sysarc.2025.103406","url":null,"abstract":"<div><div>B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-trees are widely used in databases and they have been optimized for persistent memory (PM) in recent studies. However, existing PM-oriented B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree designs are facing write performance penalties, high tail latency, and scalability issues, which are caused by three critical design limitations and the issues can be amplified on PM due to asymmetric write and read performance of PM: <strong>(1)</strong> node splits can lead to massive data migration; <strong>(2)</strong> frequent node splits can lead to high overhead of cascading modification; <strong>(3)</strong> node revision can lead to inefficient parallelism. In this paper, we propose a novel B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree-based index for PM with <strong>H</strong>igh write performance and <strong>L</strong>ow tail latency, called <strong>HLTree</strong>, to solve the aforementioned issues and optimize both performance and tail latency for PM-oriented B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>tree. First, HLTree employs a new node pre-split strategy to reduce the write overhead of legacy B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree designs. Second, HLTree decouples the structural modification operations from the critical path of the B<span><math><msup><mrow></mrow><mrow><mo>+</mo></mrow></msup></math></span>-tree and completes it asynchronously to reduce the overhead of cascading modification. Finally, HLTree optimizes optimistic version locks to reduce conflicts among readers and writers for lower latency and better scalability. Based on the evaluations conducted on Intel Optane DCPMM, compared with <span><math><mi>μ</mi></math></span>Tree/SSB-Tree/Fast&amp;Fair/FPTree, HLTree provides 1.06<span><math><mo>×</mo></math></span>/2.38<span><math><mo>×</mo></math></span>/2.16<span><math><mo>×</mo></math></span>/1.55<span><math><mo>×</mo></math></span> read throughput and 1.50<span><math><mo>×</mo></math></span>/2.28<span><math><mo>×</mo></math></span>/2.13<span><math><mo>×</mo></math></span>/1.58<span><math><mo>×</mo></math></span> write throughput on average, respectively. Moreover, HLTree reduces up to one order of magnitude lower of the 99.9th percentile tail latency.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"163 ","pages":"Article 103406"},"PeriodicalIF":3.7,"publicationDate":"2025-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143864669","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cloud-aided attribute-based encryption with efficient tracing and accountability auditing 云辅助的基于属性的加密,具有高效的跟踪和责任审计
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-12 DOI: 10.1016/j.sysarc.2025.103407
Fei Meng , Leixiao Cheng
{"title":"Cloud-aided attribute-based encryption with efficient tracing and accountability auditing","authors":"Fei Meng ,&nbsp;Leixiao Cheng","doi":"10.1016/j.sysarc.2025.103407","DOIUrl":"10.1016/j.sysarc.2025.103407","url":null,"abstract":"<div><div>How to ensure data confidentiality and prevent data abuse has been a major challenge of cloud storage services. Attribute-based encryption (ABE) provides a solution for this problem by allowing authorized users with proper attributes to decrypt the ciphertext. Traceability and accountability are indispensable requirements for an ABE system to trace users leaking their private keys and to audit the accountability of authority in framing innocent users. However, in existing related works, user has a typically large private key, the size of which depends on the number of user’s attributes. Accordingly, the tracing and auditing costs are computationally expensive. To alleviate the cost of tracing and auditing, we construct a cloud-aided ABE scheme with efficient tracing and auditing. Specifically, we replace the ABE-type private key in previous works by the “transform key”. Given an ABE ciphertext, if the user is authorized to decrypt it, the cloud could use the transform key to transform the ABE ciphertext to a simpler one related to user’s identity. In this case, the user only needs to keep a fairly short and constant-size private key. As a result, it is much more efficient to trace and audit such a short private key in our scheme.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"163 ","pages":"Article 103407"},"PeriodicalIF":3.7,"publicationDate":"2025-04-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143826410","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Configuration-aware approaches for enhancing energy efficiency in FPGA-based deep learning accelerators 在基于fpga的深度学习加速器中提高能效的配置感知方法
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-11 DOI: 10.1016/j.sysarc.2025.103410
Chao Qian, Tianheng Ling, Christopher Cichiwskyj, Gregor Schiele
{"title":"Configuration-aware approaches for enhancing energy efficiency in FPGA-based deep learning accelerators","authors":"Chao Qian,&nbsp;Tianheng Ling,&nbsp;Christopher Cichiwskyj,&nbsp;Gregor Schiele","doi":"10.1016/j.sysarc.2025.103410","DOIUrl":"10.1016/j.sysarc.2025.103410","url":null,"abstract":"<div><div>In the rapidly evolving domain of the Internet of Things (IoT), this study focuses on enhancing the energy efficiency of Deep Learning accelerators implemented on FPGA-based heterogeneous platforms aligned with the principles of sustainable computing. Diverging from the conventional focus on the inference phase, this research introduces innovative optimizations aimed at minimizing the overhead associated with the FPGA configuration phase. Our investigation achieved a remarkable 40.13-fold reduction in configuration energy each time the FPGA is powered on through precise fine-tuning of configuration parameters. Furthermore, the implementation of our Idle-Waiting strategy significantly reduced the overall energy consumption across requests. Under scenarios with regular request periods, the enhanced Idle-Waiting strategy augmented with power-saving methods, outperforms the traditional On-Off strategy in duty-cycle mode for request periods extending up to 499.06 ms. This enhancement is most pronounced at a 40 ms request period, where it increases the system’s operational lifetime by a factor of 12.39 within a 4147 J energy budget. Additionally, this paper introduces an adaptive strategy switching approach to manage scenarios with irregular request periods, employing both predefined and learnable threshold methods. This approach is not only more consistently stable than single-strategy methods but also generally outperforms them. Within this approach, our learnable threshold experiences only a 10% performance drop compared to the future-aware strategy and is at least 6% better than using single-strategy methods. These results underscore the significant potential for deploying more energy-efficient and sustainable systems within IoT applications. Future research will explore the application of these power-saving techniques to a broader spectrum of tasks on diverse FPGA platforms.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"163 ","pages":"Article 103410"},"PeriodicalIF":3.7,"publicationDate":"2025-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143839491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reducing Errors and Powers in LPDDR for DNN Inference: A Compression and IECC-Based Approach 减少DNN推理中LPDDR的误差和功率:一种基于压缩和iec的方法
IF 3.7 2区 计算机科学
Journal of Systems Architecture Pub Date : 2025-04-10 DOI: 10.1016/j.sysarc.2025.103409
Jae-Youn Hong , Je-Woo Jang , Sung-Hyuk Cho , Youngbae Kong , Sungkyu Kim , Youngjung Kang , Jaehyung Ko , Jaeyong Chung , Joon-Sung Yang
{"title":"Reducing Errors and Powers in LPDDR for DNN Inference: A Compression and IECC-Based Approach","authors":"Jae-Youn Hong ,&nbsp;Je-Woo Jang ,&nbsp;Sung-Hyuk Cho ,&nbsp;Youngbae Kong ,&nbsp;Sungkyu Kim ,&nbsp;Youngjung Kang ,&nbsp;Jaehyung Ko ,&nbsp;Jaeyong Chung ,&nbsp;Joon-Sung Yang","doi":"10.1016/j.sysarc.2025.103409","DOIUrl":"10.1016/j.sysarc.2025.103409","url":null,"abstract":"<div><div>In modern edge systems, the demand for data processing, especially for complex DNN tasks, is rapidly increasing. To address this, various compression schemes have been proposed to enable on-device AI while meeting the strict power and storage constraints of edge devices. However, despite these advancements, the compatibility of the compression methods with edge device memory, such as LPDDR, has not been thoroughly investigated. LPDDR operates at low voltage and faces reliability challenges like cell leakage, which is particularly concerning for applications where accuracy is critical, such as Advanced Driver Assistance Systems (ADAS) or medical devices. To address these reliability concerns, an ECC engine, known as IECC, is employed within each LPDDR bank. While IECC improves reliability, it also incurs performance penalties due to Read-Modify-Write (RMW) operations and parity storage overheads. This paper introduces RELIA, a DNN weight compression scheme with three-stage protection, aimed at enabling power-efficient and reliable DNN operations in mobile environments. RELIA reduces the operation granularity of the IECC engine to eliminate RMW overhead. Additionally, it proposes a SEC-FOEC(72,64) scheme (Single Error Correction-Frequently Occurring Error Correction) that can correct 99.97% of LPDDR errors. To mitigate the added storage overhead, a compression scheme based on DNN weight characteristics is introduced. Experimental results show RELIA outperforms traditional IECC schemes, reducing power by 16.12%, cycles by 12.6%, energy by 30.62%, and storage by 22.78%, while offering superior reliability in DNN inference.</div></div>","PeriodicalId":50027,"journal":{"name":"Journal of Systems Architecture","volume":"163 ","pages":"Article 103409"},"PeriodicalIF":3.7,"publicationDate":"2025-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143847739","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信