IEEE Transactions on Computers最新文献

筛选
英文 中文
Raptor-T: A Fused and Memory-Efficient Sparse Transformer for Long and Variable-Length Sequences Raptor-T:用于长序列和变长序列的融合且内存效率高的稀疏变换器
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-16 DOI: 10.1109/TC.2024.3389507
Hulin Wang;Donglin Yang;Yaqi Xia;Zheng Zhang;Qigang Wang;Jianping Fan;Xiaobo Zhou;Dazhao Cheng
{"title":"Raptor-T: A Fused and Memory-Efficient Sparse Transformer for Long and Variable-Length Sequences","authors":"Hulin Wang;Donglin Yang;Yaqi Xia;Zheng Zhang;Qigang Wang;Jianping Fan;Xiaobo Zhou;Dazhao Cheng","doi":"10.1109/TC.2024.3389507","DOIUrl":"10.1109/TC.2024.3389507","url":null,"abstract":"Transformer-based models have made significant advancements across various domains, largely due to the self-attention mechanism's ability to capture contextual relationships in input sequences. However, processing long sequences remains computationally expensive for Transformer models, primarily due to the \u0000<inline-formula><tex-math>$O(n^{2})$</tex-math></inline-formula>\u0000 complexity associated with self-attention. To address this, sparse attention has been proposed to reduce the quadratic dependency to linear. Nevertheless, deploying the sparse transformer efficiently encounters two major obstacles: 1) Existing system optimizations are less effective for the sparse transformer due to the algorithm's approximation properties leading to fragmented attention, and 2) the variability of input sequences results in computation and memory access inefficiencies. We present Raptor-T, a cutting-edge transformer framework designed for handling long and variable-length sequences. Raptor-T harnesses the power of the sparse transformer to reduce resource requirements for processing long sequences while also implementing system-level optimizations to accelerate inference performance. To address the fragmented attention issue, Raptor-T employs fused and memory-efficient Multi-Head Attention. Additionally, we introduce an asynchronous data processing method to mitigate GPU-blocking operations caused by sparse attention. Furthermore, Raptor-T minimizes padding for variable-length inputs, effectively reducing the overhead associated with padding and achieving balanced computation on GPUs. In evaluation, we compare Raptor-T's performance against state-of-the-art frameworks on an NVIDIA A100 GPU. The experimental results demonstrate that Raptor-T outperforms FlashAttention-2 and FasterTransformer, achieving an impressive average end-to-end performance improvement of 3.41X and 3.71X, respectively.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1852-1865"},"PeriodicalIF":3.7,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140617273","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Ara2: Exploring Single- and Multi-Core Vector Processing With an Efficient RVV 1.0 Compliant Open-Source Processor Ara2:利用符合 RVV 1.0 标准的高效开源处理器探索单核和多核矢量处理技术
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-15 DOI: 10.1109/TC.2024.3388896
Matteo Perotti;Matheus Cavalcante;Renzo Andri;Lukas Cavigelli;Luca Benini
{"title":"Ara2: Exploring Single- and Multi-Core Vector Processing With an Efficient RVV 1.0 Compliant Open-Source Processor","authors":"Matteo Perotti;Matheus Cavalcante;Renzo Andri;Lukas Cavigelli;Luca Benini","doi":"10.1109/TC.2024.3388896","DOIUrl":"10.1109/TC.2024.3388896","url":null,"abstract":"Vector processing is highly effective in boosting processor performance and efficiency for data-parallel workloads. In this paper, we present Ara2, the first fully open-source vector processor to support the RISC-V V 1.0 frozen ISA. We evaluate Ara2's performance on a diverse set of data-parallel kernels for various problem sizes and vector-unit configurations, achieving an average functional-unit utilization of 95% on the most computationally intensive kernels. We pinpoint performance boosters and bottlenecks, including the scalar core, memories, and vector architecture, providing insights into the main vector architecture's performance drivers. Leveraging the openness of the design, we implement Ara2 in a 22nm technology, characterize its PPA metrics on various configurations (2-16 lanes), and analyze its microarchitecture and implementation bottlenecks. Ara2 achieves a state-of-the-art energy efficiency of 37.8 DP-GFLOPS/W (0.8V) and 1.35GHz of clock frequency (critical path: \u0000<inline-formula><tex-math>$sim$</tex-math></inline-formula>\u000040 FO4 gates). Finally, we explore the performance and energy-efficiency trade-offs of multi-core vector processors: we find that multiple vector cores help overcome the scalar core issue-rate bound that limits short-vector performance. For example, a cluster of eight 2-lane Ara2 (16 FPUs) achieves more than 3x better performance than a 16-lane single-core Ara2 (16 FPUs) when executing a 32x32x32 matrix multiplication, with 1.5x improved energy efficiency.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1822-1836"},"PeriodicalIF":3.7,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575962","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Construction of Reed-Solomon Erasure Codes With Four Parities Based on Systematic Vandermonde Matrices 基于系统范德蒙德矩阵构建四奇偶校验的里德-所罗门纠错码
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-10 DOI: 10.1109/TC.2024.3387069
Leilei Yu;Yunghsiang S. Han
{"title":"Construction of Reed-Solomon Erasure Codes With Four Parities Based on Systematic Vandermonde Matrices","authors":"Leilei Yu;Yunghsiang S. Han","doi":"10.1109/TC.2024.3387069","DOIUrl":"10.1109/TC.2024.3387069","url":null,"abstract":"In 2021, Tang et al. proposed an improved construction of Reed-Solomon (RS) erasure codes with four parity symbols to accelerate the computation of Reed-Muller (RM) transform-based RS algorithm. The idea is to change the original Vandermonde parity-check matrix into a systematic Vandermonde parity-check matrix. However, the construction relies on a computer search and requires that the size of the information vector of RS codes does not exceed \u0000<inline-formula><tex-math>$52$</tex-math></inline-formula>\u0000. This paper improves its idea and proposes a purely algebraic construction. The proposed method has a more explicit construction, a wider range of codeword lengths, and competitive encoding/erasure decoding computational complexity.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1875-1882"},"PeriodicalIF":3.7,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575807","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-Based Accelerators 基于 Chiplet 的加速器上多 DNN 工作负载的多目标硬件映射协同优化
IF 3.6 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-10 DOI: 10.1109/TC.2024.3386067
Abhijit Das;Enrico Russo;Maurizio Palesi
{"title":"Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-Based Accelerators","authors":"Abhijit Das;Enrico Russo;Maurizio Palesi","doi":"10.1109/TC.2024.3386067","DOIUrl":"10.1109/TC.2024.3386067","url":null,"abstract":"The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings together heterogeneous sub-accelerators in the form of chiplets, interconnected by a Network-on-Package (NoP). This paper addresses the challenge of selecting the most suitable sub-accelerators, configuring them, determining their optimal placement in the NoP, and mapping the layers of a predetermined set of DNNs spatially and temporally. The objective is to minimise execution time and energy consumption during parallel execution while also minimising the overall cost, specifically the silicon area, of the accelerator. This paper presents MOHaM, a framework for multi-objective hardware-mapping co-optimisation for multi-DNN workloads on chiplet-based accelerators. MOHaM exploits a multi-objective evolutionary algorithm that has been specialised for the given problem by incorporating several customised genetic operators. MOHaM is evaluated against state-of-the-art Design Space Exploration (DSE) frameworks on different multi-DNN workload scenarios. The solutions discovered by MOHaM are Pareto optimal compared to those by the state-of-the-art. Specifically, MOHaM-generated accelerator designs can reduce latency by up to \u0000<inline-formula><tex-math>$96%$</tex-math></inline-formula>\u0000 and energy by up to \u0000<inline-formula><tex-math>$96.12%$</tex-math></inline-formula>\u0000.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 8","pages":"1883-1898"},"PeriodicalIF":3.6,"publicationDate":"2024-04-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575444","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
COALA: A Compiler-Assisted Adaptive Library Routines Allocation Framework for Heterogeneous Systems COALA:异构系统的编译器辅助自适应库例程分配框架
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-09 DOI: 10.1109/TC.2024.3385269
Qinyun Cai;Guanghua Tan;Wangdong Yang;Xianhao He;Yuwei Yan;Keqin Li;Kenli Li
{"title":"COALA: A Compiler-Assisted Adaptive Library Routines Allocation Framework for Heterogeneous Systems","authors":"Qinyun Cai;Guanghua Tan;Wangdong Yang;Xianhao He;Yuwei Yan;Keqin Li;Kenli Li","doi":"10.1109/TC.2024.3385269","DOIUrl":"10.1109/TC.2024.3385269","url":null,"abstract":"Experienced developers often leverage well-tuned libraries and allocate their routines for computing tasks to enhance performance when building modern scientific and engineering applications. However, such well-tuned libraries are meticulously customized for specific target architectures or environments. Additionally, the performance of their routines is significantly impacted by the actual input data of computing tasks, which often remains uncertain until runtime. Accordingly, statically allocating these library routines may hinder the adaptability of applications and compromise performance, particularly in the context of heterogeneous systems. To address this issue, we propose the Compiler-Assisted Adaptive Library Routines Allocation (COALA) framework for heterogeneous systems. COALA is a fully automated mechanism that employs compiler assistance for dynamic allocation of the most suitable routine to each computing task on heterogeneous systems. It allows the deployment of varying allocation policies tailored to specific optimization targets. During the application compilation process, COALA reconstructs computing tasks and inserts a probe for each of these tasks. Probes serve the purpose of conveying vital information about the requirements of each task, including its computing objective, data size, and computing flops, to a user-level allocation component at runtime. Subsequently, the allocation component utilizes the probe information along with the allocation policy to assign the most optimal library routine for executing the computing tasks. In our prototype, we further introduce and deploy a performance-oriented allocation policy founded on a machine learning-based performance evaluation method for library routines. Experimental verification and evaluation on two heterogeneous systems reveal that COALA can significantly improve application performance, with gains of up to 4.3x for numerical simulation software and 4.2x for machine learning applications, and enhance system utilization by up to 27.8%.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1724-1737"},"PeriodicalIF":3.7,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Incendio: Priority-Based Scheduling for Alleviating Cold Start in Serverless Computing Incendio:基于优先级的调度,缓解无服务器计算中的冷启动问题
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-08 DOI: 10.1109/TC.2024.3386063
Xinquan Cai;Qianlong Sang;Chuang Hu;Yili Gong;Kun Suo;Xiaobo Zhou;Dazhao Cheng
{"title":"Incendio: Priority-Based Scheduling for Alleviating Cold Start in Serverless Computing","authors":"Xinquan Cai;Qianlong Sang;Chuang Hu;Yili Gong;Kun Suo;Xiaobo Zhou;Dazhao Cheng","doi":"10.1109/TC.2024.3386063","DOIUrl":"10.1109/TC.2024.3386063","url":null,"abstract":"In serverless computing, cold start results in long response latency. Existing approaches strive to alleviate the issue by reducing the number of cold starts. However, our measurement based on real-world production traces shows that the minimum number of cold starts does not equate to the minimum response latency, and solely focusing on optimizing the number of cold starts will lead to sub-optimal performance. The root cause is that functions have different priorities in terms of latency benefits by transferring a cold start to a warm start. In this paper, we propose \u0000<i>Incendio</i>\u0000, a serverless computing framework exploiting priority-based scheduling to minimize the overall response latency from the perspective of cloud providers. We reveal the priority of a function is correlated to multiple factors and design a priority model based on Spearman's rank correlation coefficient. We integrate a hybrid Prophet-LightGBM prediction model to dynamically manage runtime pools, which enables the system to prewarm containers in advance and terminate containers at the appropriate time. Furthermore, to satisfy the low-cost and high-accuracy requirements in serverless computing, we propose a Clustered Reinforcement Learning-based function scheduling strategy. The evaluations show that Incendio speeds up the native system by 1.4\u0000<inline-formula><tex-math>$times$</tex-math></inline-formula>\u0000, and achieves 23% and 14.8% latency reductions compared to two state-of-the-art approaches.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1780-1794"},"PeriodicalIF":3.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Design, Implementation and Evaluation of a New Variable Latency Integer Division Scheme 新型可变延迟整除方案的设计、实施和评估
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-08 DOI: 10.1109/TC.2024.3386060
Marco Angioli;Marcello Barbirotta;Abdallah Cheikh;Antonio Mastrandrea;Francesco Menichelli;Saeid Jamili;Mauro Olivieri
{"title":"Design, Implementation and Evaluation of a New Variable Latency Integer Division Scheme","authors":"Marco Angioli;Marcello Barbirotta;Abdallah Cheikh;Antonio Mastrandrea;Francesco Menichelli;Saeid Jamili;Mauro Olivieri","doi":"10.1109/TC.2024.3386060","DOIUrl":"10.1109/TC.2024.3386060","url":null,"abstract":"Integer division is key for various applications and often represents the performance bottleneck due to its inherent mathematical properties that limit its parallelization. This paper presents a new data-dependent variable latency division algorithm derived from the classic non-performing restoring method. The proposed technique exploits the relationship between the number of leading zeros in the divisor and in the partial remainder to dynamically detect and skip those iterations that result in a simple left shift. While a similar principle has been exploited in previous works, the proposed approach outperforms existing variable latency divider schemes in average latency and power consumption. We detail the algorithm and its implementation in four variants, offering versatility for the specific application requirements. For each variant, we report the average latency evaluated with different benchmarks, and we analyze the synthesis results for both FPGA and ASIC deployment, reporting clock speed, average execution time, hardware resources, and energy consumption, compared with existing fixed and variable latency dividers.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1767-1779"},"PeriodicalIF":3.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10494681","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Error-Detection Schemes for Analog Content-Addressable Memories 模拟内容可寻址存储器的错误检测方案
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-08 DOI: 10.1109/TC.2024.3386065
Ron M. Roth
{"title":"Error-Detection Schemes for Analog Content-Addressable Memories","authors":"Ron M. Roth","doi":"10.1109/TC.2024.3386065","DOIUrl":"10.1109/TC.2024.3386065","url":null,"abstract":"Analog content-addressable memories (in short, a-CAMs) have been recently introduced as accelerators for machine-learning tasks, such as tree-based inference or implementation of nonlinear activation functions. The cells in these memories contain nanoscale memristive devices, which may be susceptible to various types of errors, such as manufacturing defects, inaccurate programming of the cells, or drifts in their contents over time. The objective of this work is to develop techniques for overcoming the reliability issues that are caused by such error events. To this end, several coding schemes are presented for the detection of errors in a-CAMs. These schemes consist of an encoding stage, a detection cycle (which is performed periodically), and some minor additions to the hardware. During encoding, redundancy symbols are programmed into a portion of the a-CAM (or, alternatively, are written into an external memory). During each detection cycle, a certain set of input vectors is applied to the a-CAM. The schemes differ in several ways, e.g., in the range of alphabet sizes that they are most suitable for, in the tradeoff that each provides between redundancy and hardware additions, or in the type of errors that they handle (Hamming metric versus \u0000<inline-formula><tex-math>$L_{1}$</tex-math></inline-formula>\u0000 metric).","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1795-1808"},"PeriodicalIF":3.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140602791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Multi-Grained Trace Collection, Analysis, and Management of Diverse Container Images 多种容器图像的多级跟踪收集、分析和管理
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-08 DOI: 10.1109/TC.2024.3383966
Zhuo Huang;Qi Zhang;Hao Fan;Song Wu;Chen Yu;Hai Jin;Jun Deng;Jing Gu;Zhimin Tang
{"title":"Multi-Grained Trace Collection, Analysis, and Management of Diverse Container Images","authors":"Zhuo Huang;Qi Zhang;Hao Fan;Song Wu;Chen Yu;Hai Jin;Jun Deng;Jing Gu;Zhimin Tang","doi":"10.1109/TC.2024.3383966","DOIUrl":"10.1109/TC.2024.3383966","url":null,"abstract":"Container technology is getting popular in cloud environments due to its lightweight feature and convenient deployment. The container registry plays a critical role in container-based clouds, as many container startups involve downloading layer-structured container images from a container registry. However, the container registry is struggling to efficiently manage images (i.e., transfer and store) with the emergence of diverse services and new image formats. The reason is that the container registry manages images uniformly at layer granularity. On the one hand, such uniform layer-level management probably cannot fit the various requirements of different kinds of containerized services well. On the other hand, new image formats organizing data in blocks or files cannot benefit from such uniform layer-level image management. In this paper, we perform the first analysis of image traces at multiple granularities (i.e., image-, layer-, and file-level) for various services and provide an in-depth comparison of different image formats. The traces are collected from a production-level container registry, amounting to 24 million requests and involving more than 184 TB of transferred data. We provide a number of valuable insights, including request patterns of services, file-level access patterns, and bottlenecks associated with different image formats. Based on these insights, we also propose two optimizations to improve image transfer and application deployment.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1698-1710"},"PeriodicalIF":3.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10494783","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575933","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Analysis and Mitigation of Shared Resource Contention on Heterogeneous Multicore: An Industrial Case Study 异构多核共享资源争用的分析与缓解:工业案例研究
IF 3.7 2区 计算机科学
IEEE Transactions on Computers Pub Date : 2024-04-08 DOI: 10.1109/TC.2024.3386059
Michael Bechtel;Heechul Yun
{"title":"Analysis and Mitigation of Shared Resource Contention on Heterogeneous Multicore: An Industrial Case Study","authors":"Michael Bechtel;Heechul Yun","doi":"10.1109/TC.2024.3386059","DOIUrl":"10.1109/TC.2024.3386059","url":null,"abstract":"In this paper, we present a solution to the industrial challenge put forth by ARM in 2022. We systematically analyze the effect of shared resource contention to an augmented reality head-up display (AR-HUD) case-study application of the industrial challenge on a heterogeneous multicore platform, NVIDIA Jetson Nano. We configure the AR-HUD application such that it can process incoming image frames in real-time at 20Hz on the platform. We use Microarchitectural Denial-of-Service (DoS) attacks as aggressor workloads of the challenge and show that they can dramatically impact the latency and accuracy of the AR-HUD application. This results in significant deviations of the estimated trajectories from known ground truths, despite our best effort to mitigate their influence by using cache partitioning and real-time scheduling of the AR-HUD application. To address the challenge, we propose RT-Gang++, a partitioned real-time gang scheduling framework with last-level cache (LLC) and integrated GPU bandwidth throttling capabilities. By applying RT-Gang++, we are able to achieve desired level of performance of the AR-HUD application even in the presence of fully loaded aggressor tasks.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"73 7","pages":"1753-1766"},"PeriodicalIF":3.7,"publicationDate":"2024-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140575457","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信