Future Generation Computer Systems-The International Journal of Escience最新文献_第6页

Dynamic spatio-temporal graph interaction attention network for traffic flow prediction 交通流预测的动态时空图交互注意网络

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-09-04 DOI: 10.1016/j.future.2025.108116

Wenshu Li , Jianhang Fei , Yongbing Jiang , Xiaoying Guo , Xiulin Geng , Xiaoyu He

{"title":"Dynamic spatio-temporal graph interaction attention network for traffic flow prediction","authors":"Wenshu Li , Jianhang Fei , Yongbing Jiang , Xiaoying Guo , Xiulin Geng , Xiaoyu He","doi":"10.1016/j.future.2025.108116","DOIUrl":"10.1016/j.future.2025.108116","url":null,"abstract":"<div><div>Against the backdrop of rapid urbanization, traffic flow prediction has become pivotal in urban transportation management and road planning. However, traffic data exhibits complex spatio-temporal dependencies, including long-term periodic trends and abrupt short-term fluctuations. Moreover, traffic patterns differ markedly across regions due to variations in geographic topology and the dynamic nature of inter-node interactions. To address these challenges, we propose a traffic flow prediction model based on a dynamic spatio-temporal graph interaction attention network (DynSTGIA). The model integrates a Time Fusion Attention (TFA) module to jointly capture localized short-term fluctuations and global long-term temporal dependencies, while a Memory-Guided Spatio-temporal Graph Module (MG-STM) incorporates learnable memory with multi-head attention to adaptively generate dynamic graphs and capture evolving spatial correlations. Moreover, to overcome the limitation of modality separation in traditional spatio-temporal models and enhance spatio-temporal fusion, we introduce an interaction learning mechanism that enables deep integration of temporal and spatial representations. Extensive experiments on five real-world traffic datasets demonstrate that DynSTGIA achieves up to 2.1 % MAE and 9.8 % RMSE improvements over strong baselines, confirming its superior performance across diverse traffic scenarios.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108116"},"PeriodicalIF":6.2,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Benchmarking a DNN for aortic valve calcium lesions segmentation on FPGA-based DPU using the vitis AI toolchain 在基于fpga的DPU上使用vitis AI工具链对主动脉瓣钙病变分割的DNN进行基准测试

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-09-04 DOI: 10.1016/j.future.2025.108115

Valentina Sisini , Andrea Miola , Giada Minghini , Enrico Calore , Armando Ugo Cavallo , Sebastiano Fabio Schifano , Cristian Zambelli

{"title":"Benchmarking a DNN for aortic valve calcium lesions segmentation on FPGA-based DPU using the vitis AI toolchain","authors":"Valentina Sisini , Andrea Miola , Giada Minghini , Enrico Calore , Armando Ugo Cavallo , Sebastiano Fabio Schifano , Cristian Zambelli","doi":"10.1016/j.future.2025.108115","DOIUrl":"10.1016/j.future.2025.108115","url":null,"abstract":"<div><div>Semantic segmentation assigns a class to every pixel of an image to automatically locate objects in the context of computer vision applications for autonomous vehicles, robotics, agriculture, gaming, and medical imaging. Deep Neural Network models, such as Convolutional Neural Networks (CNNs), are widely used for this purpose. Among the plethora of models, the U-Net is a standard in biomedical imaging. Nowadays, GPUs efficiently perform segmentation and are the reference architectures for running CNNs, and FPGAs compete for inferences among alternative platforms, promising higher energy efficiency and lower latency solutions. In this contribution, we evaluate the performance of FPGA-based Deep Processing Units (DPUs) implemented on the AMD Alveo U55C for the inference task, using calcium segmentation in cardiac aortic valve computer tomography scans as a benchmark. We design and implement a U-Net-based application, optimize the hyperparameters to maximize the prediction accuracy, perform pruning to simplify the model, and use different numerical quantizations to exploit low-precision operations supported by the DPUs and GPUs to boost the computation time. We describe how to port and deploy the U-Net model on DPUs, and we compare accuracy, throughput, and energy efficiency achieved with four generations of GPUs and a recent dual 32-core high-end CPU platform. Our results show that a complex DNN like the U-Net can run effectively on DPUs using 8-bit integer computation, achieving a prediction accuracy of approximately <span><math><mrow><mn>95</mn><mspace></mspace><mo>%</mo></mrow></math></span> in Dice and <span><math><mrow><mn>91</mn><mspace></mspace><mo>%</mo></mrow></math></span> in IoU scores. These results are comparable to those measured when running the floating-point models on GPUs and CPUs. On the one hand, in terms of computing performance, the DPUs achieves a inference latency of approximately 3.5 ms and a throughput of approximately 4.2 kPFS, boosting the performance of a 64-core CPU system by approximately <span><math><mrow><mn>10</mn><mspace></mspace><mo>%</mo></mrow></math></span> in terms of latency and a factor <span><math><mrow><mn>2</mn><mi>X</mi></mrow></math></span> in terms of throughput, but still do not overcoming the performance of GPUs when using the same numerical precision. On the other hand, considering the energy efficiency, the improvements are approximately a factor <span><math><mrow><mn>6.7</mn><mi>X</mi></mrow></math></span> compared to the CPU, and <span><math><mrow><mn>1.6</mn><mi>X</mi></mrow></math></span> compared to the P100 GPU manufactured with the same technological process (16 nm).</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108115"},"PeriodicalIF":6.2,"publicationDate":"2025-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048963","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimal repair and load balance in locally repairable codes: Design and evaluation 局部可修复代码中的最佳修复和负载平衡：设计与评估

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-09-03 DOI: 10.1016/j.future.2025.108113

Ximeng Chen , Si Wu , Hao Zhao , Yinlong Xu

{"title":"Optimal repair and load balance in locally repairable codes: Design and evaluation","authors":"Ximeng Chen , Si Wu , Hao Zhao , Yinlong Xu","doi":"10.1016/j.future.2025.108113","DOIUrl":"10.1016/j.future.2025.108113","url":null,"abstract":"<div><div>Erasure coding is increasingly deployed in modern clustered storage systems to provide low-cost reliable storage. In particular, Locally Repairable Codes (LRCs) are a popular family of repair-efficient erasure codes that receive wide deployment in practice. In this paper, we analyze the storage process formulated as a data partitioning phase plus a node selection phase for LRCs in clustered storage systems. We show that the conventional flat partitioning and random partitioning incur significant cross-cluster repair traffic, while the random node selection causes storage and network imbalance. To this end, we design a new storage scheme composed of an optimal partitioning strategy and an enhanced node selection strategy for LRCs. Our partitioning strategy minimizes the cross-cluster repair traffic by dividing each group of blocks into the minimum number of clusters and further compactly placing the blocks. Our node selection strategy improves load balance by choosing less-loaded clusters and nodes to store blocks with potential higher access frequency at higher priority. To accommodate access fluctuations, we enhance our storage scheme with a rebalancing strategy that restores storage and network balance at both the cluster and node levels. We implement our storage scheme on a key-value store prototype atop Memcached. Evaluation on a LAN testbed shows that our scheme greatly improves the repair performance and load balance ratio compared to the baseline.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108113"},"PeriodicalIF":6.2,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145048964","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Out-of-core aware multi-GPU rendering for large-scale scene visualization 核外感知多gpu渲染大规模场景可视化

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-09-01 DOI: 10.1016/j.future.2025.108109

Milan Jaros, Lubomir Riha, Petr Strakos, Tomas Kozubek

{"title":"Out-of-core aware multi-GPU rendering for large-scale scene visualization","authors":"Milan Jaros, Lubomir Riha, Petr Strakos, Tomas Kozubek","doi":"10.1016/j.future.2025.108109","DOIUrl":"10.1016/j.future.2025.108109","url":null,"abstract":"<div><div>We introduce an out-of-core method for multi-GPU path tracing of large-scale scenes, leveraging memory access analysis to optimize data distribution. Our approach enables efficient rendering on multi-GPU systems even when the combined GPU memory is smaller than the total scene size. By partitioning the scene between GPU and CPU memory at the memory management level, our method strategically distributes or replicates scene data across GPUs while storing the remaining data in CPU memory. This hybrid memory strategy ensures efficient access patterns and minimizes performance bottlenecks, facilitating high-quality rendering of massive scenes. The main contribution of this rendering approach is that it enables GPUs to perform accelerated path-tracing for massive scenes that would otherwise be rendered only by CPUs. Previous work on this topic has enabled the rendering of massive scenes that are distributed among memories of all GPUs on one server. However, in that method, it is not possible to render scenes larger than the total size of all GPU memories. In our paper, we show how to break this barrier and how to combine the capacity of GPU memories with CPU main memory to render massive scenes with only a minor impact on the performance. When applied to a small system with 4xGPUs, the results are equivalent to a more powerful system with 16xGPUs if using the same number of GPUs as on the smaller system. Therefore, users can use even small systems to render massive scenes.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108109"},"PeriodicalIF":6.2,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145009093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Cybersecurity in the age of generative AI: A systematic taxonomy of AI-powered vulnerability assessment and risk management 生成人工智能时代的网络安全：人工智能驱动的脆弱性评估和风险管理的系统分类法

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-08-28 DOI: 10.1016/j.future.2025.108107

Seyedeh Leili Mirtaheri , Narges Movahed , Reza Shahbazian , Valerio Pascucci , Andrea Pugliese

引用次数: 0

Generative policy-driven HAC reinforcement learning for autonomous driving incident response 自动驾驶事故响应的生成策略驱动HAC强化学习

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-08-28 DOI: 10.1016/j.future.2025.108106

Hongtao Zhang , Jin-Qiang Wang , Shengjie Zhang , Yuanbo Jiang , Mengling Li , Binbin Yong , Qingguo Zhou , Xiaokang Zhou

{"title":"Generative policy-driven HAC reinforcement learning for autonomous driving incident response","authors":"Hongtao Zhang , Jin-Qiang Wang , Shengjie Zhang , Yuanbo Jiang , Mengling Li , Binbin Yong , Qingguo Zhou , Xiaokang Zhou","doi":"10.1016/j.future.2025.108106","DOIUrl":"10.1016/j.future.2025.108106","url":null,"abstract":"<div><div>Reinforcement learning (RL) has become a pivotal approach in autonomous driving decision problems owing to its superior decision optimization capabilities. Existing discrete-time RL frameworks based on Markov decision process modeling face significant challenges in incident response control processes. These approaches lead to high collision rates during low-frequency decision-making and severe action oscillations during high-frequency decision-making. The fundamental limitation is that discrete-time RL methods cannot adapt to real driving scenarios where vehicle decisions rely on continuous-time dynamic system modeling. To address this, in this paper, we propose a generative policy-driven Hamilton-Jacobi-Bellman Actor-Critic (HAC) RL framework, which leverages the Actor to generate action policies and extends continuous-time Hamilton-Jacobi-Bellman capabilities to discrete-time Actor-Critic frameworks through Lipschitz constraints on vehicle control actions. Specifically, the HAC framework integrates deep deterministic policy gradient (DDPG) to implement the HJ-DDPG that incorporates two optimization approaches including delayed policy network updates and dynamic parameter space noise to enhance policy evaluation accuracy and exploration capability. Experimental results demonstrate that vehicles trained using the proposed method achieved 52 % lower average jerk and 48 % reduced steering rates compared to baseline method (Proximal Policy Optimization, PPO) under high-speed conditions, resulting in smoother and safer lane-changing maneuvers.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108106"},"PeriodicalIF":6.2,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training 分布式训练中有效通信调度的数据依赖性放松

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-08-27 DOI: 10.1016/j.future.2025.108103

Lin Meng , Yuzhong Sun , Jie Zhu

{"title":"DeFT: Relaxing data dependencies for efficient communication scheduling in distributed training","authors":"Lin Meng , Yuzhong Sun , Jie Zhu","doi":"10.1016/j.future.2025.108103","DOIUrl":"10.1016/j.future.2025.108103","url":null,"abstract":"<div><div>Communication scheduling aims to reduce communication bottlenecks in data parallel training (DP) by maximizing the overlap between computation and communication. However, existing schemes fall short due to three main issues: (1) hard data dependencies break some overlapping between communication and computation; (2) high coverage rates impair further improvement on performance; (3) imbalanced communication/computation times of tensors caused by partitioning/fusion strategies cause more bubbles. Therefore, we propose a new communication scheduling scheme DeFT, whose key insight is to relax data dependencies and support flexible scheduling in distributed training without reordering bucket communications. DeFT uncovers new overlapping chances in training by transforming the scheduling problem into multiple knapsack problems. Specifically, DeFT eliminates hard dependencies with delayed updates, reducing the coverage rate by adjusting update frequency and utilizing heterogeneous communication links, merging the computation times of backward or forward as the knapsack capacity to avoid the negative impact of unbalanced tensors. Additionally, DeFT preserves training accuracy by adjusting its scheduling strategy via convergence loss quantification. Extensive experiments with 16 A100 GPUs showed that DeFT achieved speedups of 29 % to 115 % on three representative benchmarks compared to US-Byte and Bytescheduler with no loss of accuracy.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108103"},"PeriodicalIF":6.2,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144931716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multidimensional resource load-aware task migration in mobile edge computing 移动边缘计算中的多维资源负载感知任务迁移

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-08-27 DOI: 10.1016/j.future.2025.108091

Chuangxin Li , Jixiao Li , Yongqiang Gao, Jiawei Song, Zhigang Wang

{"title":"Multidimensional resource load-aware task migration in mobile edge computing","authors":"Chuangxin Li , Jixiao Li , Yongqiang Gao, Jiawei Song, Zhigang Wang","doi":"10.1016/j.future.2025.108091","DOIUrl":"10.1016/j.future.2025.108091","url":null,"abstract":"<div><div>With the increasing demands for low latency, high computational power, and distributed execution in applications such as autonomous driving, augmented reality (AR), and smart manufacturing, Mobile Edge Computing (MEC) has emerged as a solution by bringing computation and storage closer to end users. In MEC environments, complex computational tasks are often structured as workflows, consisting of multiple interdependent subtasks that require distributed execution across multiple MEC servers. However, workflow tasks in MEC environments often involve complex data and temporal dependencies, requiring frequent task migration across multiple MEC servers due to user mobility. Inefficient task migration canlead to increased execution delays, communication overhead, and server overload, ultimately degrading system performance and service quality. Therefore, how to effectively optimize workflow task migration strategies to ensure on-time task completion while achieving balanced resource allocation among edge servers remains a key challenge in workflow scheduling for MEC environments. To address the challenge, this paper proposes a comprehensive strategy integrating user trajectory prediction and federated deep reinforcement learning to jointly optimize workflow migration and resource allocation. A hybrid GRU-2LSTM model predicts user mobility trajectories, while a Federated Multi-Agent Deep Deterministic Policy Gradient (FMADDPG) algorithm optimizes task migration and resource allocation. Simulation results demonstrate that the proposed strategy reduces average load imbalance by 10 %–20 % and lowers workflow timeout rates by 7 %–27 %, highlighting its effectiveness in workflow scheduling, resource optimization, and overall system efficiency in dynamic MEC environments.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108091"},"PeriodicalIF":6.2,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144925378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-tree genetic programming for adaptive dynamic fault-tolerant task scheduling of satellite edge computing 卫星边缘计算自适应动态容错任务调度的多树遗传规划

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-08-27 DOI: 10.1016/j.future.2025.108099

Changzhen Zhang, Jun Yang

{"title":"Multi-tree genetic programming for adaptive dynamic fault-tolerant task scheduling of satellite edge computing","authors":"Changzhen Zhang, Jun Yang","doi":"10.1016/j.future.2025.108099","DOIUrl":"10.1016/j.future.2025.108099","url":null,"abstract":"<div><div>Satellite Edge Computing (SEC) leverages Low Earth Orbit (LEO) satellites to provide real-time computing services globally. However, dynamic resource availability, heterogeneous task requirements, and frequent failures pose challenges to effective scheduling and fault tolerance. In this work, we propose a Genetic Programming Hyper-Heuristic (GPHH) method to learn scheduling strategies and fault-tolerant strategies for the SEC system simultaneously. Firstly, we formulate a comprehensive problem model for joint dynamic task scheduling and fault tolerance in SEC, aiming to improve task success rates for computational tasks with heterogeneous service requirements. Secondly, we design a selection rule of fault-tolerant strategies that dynamically chooses between task resubmission and replication based on task attributes and real-time resource states. Finally, to ensure adaptive real-time decision-making in dynamic environments, we propose a Multi-Tree Genetic Programming (MTGP) method to automatically learn the routing rule, queuing rule, and selection rule of fault-tolerant strategies. Experimental results show that the task success rate improvement under MTGP is about 3 %-40 % in different scenarios compared to the baseline methods. Moreover, the three tree-based rules evolved by MTGP exhibit strong interpretability, effectively capturing the intricate correlations between scheduling and fault-tolerant strategies.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108099"},"PeriodicalIF":6.2,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144923007","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Designing a QEMU plugin to profile multicore long vector RISC-V architectures: RAVE 设计一个QEMU插件来配置多核长向量RISC-V架构：RAVE

IF 6.2 2区计算机科学

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-08-27 DOI: 10.1016/j.future.2025.108100

Pablo Vizcaino , Roger Ferrer , Jesus Labarta , Filippo Mantovani

{"title":"Designing a QEMU plugin to profile multicore long vector RISC-V architectures: RAVE","authors":"Pablo Vizcaino , Roger Ferrer , Jesus Labarta , Filippo Mantovani","doi":"10.1016/j.future.2025.108100","DOIUrl":"10.1016/j.future.2025.108100","url":null,"abstract":"<div><div>Simulators are crucial during the development of a chip, like the RISC-V accelerator designed in the European Processor Initiative project. In this paper, we showcase the limitations of the current emulation solutions in the project and propose using QEMU with RAVE, a plugin we implement and describe in this document. This methodology can rapidly emulate and analyze applications running on the v1.0 and v0.7.1 RISC-V V-extension. Our plugin reports the vector and scalar instructions alongside useful information such as the vector-length being used, the single-element-width, and the register usage, among other vectorization metrics. We provide an API used from the emulated Application to control the RAVE plugin and the capability to generate vectorization traces that can be analyzed using Paraver. Additionally, we demonstrate the efficiency of our solution against other emulators such as Whipser, ETISS, Spike, or FPGA simulation. Finally, we describe the challenges and solutions for synchronizing and emulating multi-threaded and multi-process binaries, and evaluate the performance of RAVE running on multiple processors.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"175 ","pages":"Article 108100"},"PeriodicalIF":6.2,"publicationDate":"2025-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144988386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0