Future Generation Computer Systems-The International Journal of Escience最新文献

筛选
英文 中文
Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads 动态工作负载下共定位延迟关键型JVM应用程序和批处理作业的自适应CPU共享
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-23 DOI: 10.1016/j.future.2026.108387
Dishi Xu , Fagui Liu , Bin Wang , Xuhao Tang , Qingbo Wu
{"title":"Adaptive CPU sharing for co-located latency-critical JVM applications and batch jobs under dynamic workloads","authors":"Dishi Xu ,&nbsp;Fagui Liu ,&nbsp;Bin Wang ,&nbsp;Xuhao Tang ,&nbsp;Qingbo Wu","doi":"10.1016/j.future.2026.108387","DOIUrl":"10.1016/j.future.2026.108387","url":null,"abstract":"<div><div>Latency-critical (LC) long-running applications operating on Java Virtual Machines (JLRAs) often rely on substantial CPU over-provisioning to meet Service-Level Objectives (SLOs) under dynamic workloads, leading to significant resource underutilization. Additionally, JLRAs exhibit inferior cold-start performance, and frequent deletion and creation of application instances to adjust resource allocation results in performance degradation. Furthermore, harvesting redundant resources by deploying best-effort (BE) batch jobs alongside JLRAs encounters serious challenges due to contention for shared CPU resources. Therefore, we present ChaosRM, a bi-level resource management framework for JVM workload co-location to enhance resource utilization efficiency while eliminating resource contention. In contrast to the conventional approach of isolating JLRAs and batch jobs on non-overlapping CPU sets, ChaosRM proposes a tri-zone CPU isolation mechanism, utilizing two CPU zones to isolate JLRAs and batch jobs, and an shared region for concurrently executing their threads. An application-wide, learning-based Application Manager adjusts the instance states of JLRAs based on the global workload and adaptively learns the shared zone allocation strategy and the performance target represented by thread queuing time; the Node Manager on each server heuristically binds CPU sets to JLRAs and dynamically schedules batch jobs among CPU zones according to this performance target and the JLRA instance states. Experimental results show that, while guaranteeing the SLOs of JLRAs, ChaosRM reduces the completion time of batch jobs by up to 14.10% over the best-performing baseline and up to 54.29% over all baselines.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108387"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146033294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IRL-D3QN: An intelligent multi-agent learning framework for dynamic spectrum management in vehicular networks 基于IRL-D3QN的车辆网络动态频谱管理智能多智能体学习框架
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-10 DOI: 10.1016/j.future.2026.108371
Jing Wang , Wenshi Dan , Ke Yang , Xing Tang , Lingyu Yan
{"title":"IRL-D3QN: An intelligent multi-agent learning framework for dynamic spectrum management in vehicular networks","authors":"Jing Wang ,&nbsp;Wenshi Dan ,&nbsp;Ke Yang ,&nbsp;Xing Tang ,&nbsp;Lingyu Yan","doi":"10.1016/j.future.2026.108371","DOIUrl":"10.1016/j.future.2026.108371","url":null,"abstract":"<div><div>The proliferation of vehicular networks within intelligent transportation systems (ITS) has significantly increased the demand for efficient and adaptive spectrum resource allocation. Spectrum coordination is challenging due to high vehicle traffic, intensive communication environments and diversified service requirements. These are of particular significance in Vehicle-to-Everything (V2X) communications, where adaptive conditions call out powerful solutions. Multi-agent reinforcement learning (MARL) techniques are promising and have been applied to the management of dynamic spectrum access, but with limitations including overestimated value functions, unsteady policy convergence, and dependence on manual choices of rewards, these techniques have limitations as far as their application in practice. This paper presents a new framework of spectrum management IRL-D3QN, which combines Inverse Reinforcement Learning (IRL) and a Dueling Double Deep Q-Network (D3QN). This algorithm involves a prediction network of rewards on determining intrinsic motivation according to its interplay with environments, eliminating the necessity of a danger of designing rewards manually. This enhances generalization in various situations. The dueling network design contributes to learning that is more stable because it keeps the values of state and values of the action apart. In the meantime, the bias of overestimation is minimized in the case of double q-learning. It has been demonstrated through simulations that IRL-D3QN can support a higher Vehicle to Infrastructure (V2I) transmission rate by 7.94 percent and demonstrate significantly less performance degradation under heavy communication loads than state of the art RL algorithms. Therefore, it will provide a solution to the distribution of dynamic spectrum, which will be scalable and self-sufficient in the next generation of vehicular communication systems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108371"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957070","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A comparative performance and efficiency analysis of Apple’s M architectures: A GEMM case study 苹果M架构的性能和效率比较分析:一个GEMM案例研究
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.future.2026.108393
Sandra Catalán , Rafael Rodríguez-Sánchez , Carlos García Sánchez , Luis Piñuel Moreno
{"title":"A comparative performance and efficiency analysis of Apple’s M architectures: A GEMM case study","authors":"Sandra Catalán ,&nbsp;Rafael Rodríguez-Sánchez ,&nbsp;Carlos García Sánchez ,&nbsp;Luis Piñuel Moreno","doi":"10.1016/j.future.2026.108393","DOIUrl":"10.1016/j.future.2026.108393","url":null,"abstract":"<div><div>This paper evaluates the performance and energy efficiency of Apple processors across multiple ARM-based M-series generations and models (standard and Pro). The study is motivated by the increasing heterogeneity of Apple´s SoC architectures, which integrate multiple computing engines raising the scientific question of which hardware components are best suited for executing general-purpose and domain-specific computations such as the GEneral Matrix Multiply (<span>GEMM</span>). The analysis focuses on four key components: the Central Processing Unit (CPU), the Graphics Processing Unit (GPU), the matrix calculation accelerator (AMX), and the Apple Neural Engine (ANE).</div><div>The assessments use the <span>GEMM</span> as benchmark to characterize the performance of the CPU and GPU, alongside tests on AMX, which is specialized in handling large-scale mathematical operations, and tests on the ANE, which is specifically designed for Deep Learning purposes. Additionally, energy consumption data has been collected to analyze the energy efficiency of the aforementioned resources. Results highlight notable improvements in computational capacity and energy efficiency over successive generations. On one hand, the AMX stands out as the most efficient component for FP32 and FP64 workloads, significantly boosting overall system performance. In the M4 Pro, which integrates two matrix accelerators, it achieves up to 68% of the GPU’s FP32 performance while consuming only 42% of its power. On the other hand, the ANE, although limited to FP16 precision, excels in energy efficiency for low-precision tasks, surpassing other accelerators with over 700 GFLOPs/Watt under batched workloads.</div><div>This analysis offers a clear understanding of how Apple´s custom ARM designs optimize both performance and energy use, particularly in the context of multi-core processing and specialized acceleration units. In addition, a significant contribution of this study is the comprehensive comparative analysis of Apple’s accelerators, which have previously been poorly documented and scarcely studied. The analysis spans different generations and compares the accelerators against both CPU and GPU performance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108393"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AFMIS: An approximate floating-point multiplier based on input segmentation 基于输入分割的近似浮点乘法器
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-18 DOI: 10.1016/j.future.2026.108377
Asma Naseri Rad , Shaghayegh Vahdat , Ali Afzali-Kusha , Massoud Pedram
{"title":"AFMIS: An approximate floating-point multiplier based on input segmentation","authors":"Asma Naseri Rad ,&nbsp;Shaghayegh Vahdat ,&nbsp;Ali Afzali-Kusha ,&nbsp;Massoud Pedram","doi":"10.1016/j.future.2026.108377","DOIUrl":"10.1016/j.future.2026.108377","url":null,"abstract":"<div><div>This paper proposes an approximate floating-point (FP) multiplier, called AFMIS, which is based on input segmentation. The AFMIS multiplier statically divides the input mantissas into several segments and performs exact multiplication on the selected segments. This approach eliminates the need for a costly leading-one detector (LOD) circuit. The static segmentation and limited segment count in the proposed design reduce the number of required post-multiplication shift values. With only a few possible shifts, a simple multiplexer can replace a full shifter. This substitution improves speed compared with that of dynamic segmentation approaches. The proposed structure allows for adjustable accuracy levels by modifying the number of bits in each segment, making it suitable for a wide range of applications. To evaluate the efficiency of the AFMIS multiplier, its hardware parameters are compared to those of an exact FP multiplier and several other approximate FP multipliers. The comparison is performed using Synopsys Design Compiler in a 7 nm technology. The results show that the proposed multiplier achieves a mean relative error distance (MRED) of 0.27% to 18.6% while improving delay, area, and power consumption by up to 81.7%, 98%, and 99%, respectively, compared to the exact FP multiplier. Furthermore, the AFMIS multiplier outperforms other approximate FP multipliers in terms of speed, area, and energy consumption at similar accuracy levels. The utility of the AFMIS multiplier is demonstrated by its application in regression and classification tasks using neural networks (NNs) and JPEG compression. The results indicate that, in most cases, the output differences between the AFMIS multiplier and the exact multiplier are negligible.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108377"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145995521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
BiD-Accel: Accelerated bidimensional input-aware SDC vulnerability assessment for GPU static instructions BiD-Accel: GPU静态指令加速二维输入感知SDC漏洞评估
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-08 DOI: 10.1016/j.future.2026.108372
Zhenyu Qian , Lianguo Wang , Pengfei Zhang , Jianing Rao
{"title":"BiD-Accel: Accelerated bidimensional input-aware SDC vulnerability assessment for GPU static instructions","authors":"Zhenyu Qian ,&nbsp;Lianguo Wang ,&nbsp;Pengfei Zhang ,&nbsp;Jianing Rao","doi":"10.1016/j.future.2026.108372","DOIUrl":"10.1016/j.future.2026.108372","url":null,"abstract":"<div><div>Graphics Processing Units (GPUs) are increasingly used in safety-critical systems where Silent Data Corruptions (SDCs) pose severe risks. Selective Instruction Duplication (SID) can mitigate these risks but relies on accurate static-instruction vulnerability assessment, which is complicated by variations in input values and sizes. This paper presents a comprehensive study of how input characteristics shape instruction-level SDC vulnerability, which we quantify using the Static Instruction Error Probability (SIEP) and the SDC Occurrence rate (SDCO). We extend gpuFI-4 to enable fault injection mapping at the static-instruction level. Across 14 benchmarks and more than ten million single-, double-, and triple-bit injections, we find that SIEP is largely value-insensitive, whereas SDCO is highly value-sensitive. For register instructions, SDCO remains stable for random and structured-sparse inputs but differs markedly for all-zero, NaN, or denormal inputs. Moreover, when SIEP is size-sensitive, SDCO also tends to exhibit size sensitivity. We further observe that invalid-injection rates decrease with input size and that shared-memory instructions, though few, can contribute disproportionately to SDCs. Leveraging these insights, we propose BiD-Accel, a bi-dimensional, input-aware framework for accelerated static-instruction SDC vulnerability assessment. Its SIEP-driven Descending Order Sort (DOS) method achieves stable SDCO rankings with injections on only 70.4% of instructions on average, compared with 86.2% for the Random Ordering (RO) method, thereby meaningfully reducing assessment cost while preserving ranking fidelity and providing actionable guidance for robust SID under input-varying GPU workloads.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108372"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957074","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Reliability analysis of hardware accelerators for decision tree-based classifier systems 基于决策树的分类器系统硬件加速器可靠性分析
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-20 DOI: 10.1016/j.future.2026.108378
Mario Barbareschi , Salvatore Barone , Alberto Bosio , Antonio Emmanuele
{"title":"Reliability analysis of hardware accelerators for decision tree-based classifier systems","authors":"Mario Barbareschi ,&nbsp;Salvatore Barone ,&nbsp;Alberto Bosio ,&nbsp;Antonio Emmanuele","doi":"10.1016/j.future.2026.108378","DOIUrl":"10.1016/j.future.2026.108378","url":null,"abstract":"<div><div>The increasing adoption of AI models has driven applications toward the use of hardware accelerators to meet high computational demands and strict performance requirements. Beyond consideration of performance and energy efficiency, explainability and reliability have emerged as pivotal requirements, particularly for critical applications such as automotive, medical, and aerospace systems. Among the various AI models, Decision Tree Ensembles (DTEs) are particularly notable for their high accuracy and explainability. Moreover, they are particularly well-suited for hardware implementations, enabling high-performance and improved energy efficiency. However, a frequently overlooked aspect of DTEs is their reliability in the presence of hardware malfunctions. While DTEs are generally regarded as robust by design, due to their redundancy and voting mechanisms, hardware faults can still have catastrophic consequences. To address this gap, we present an in-depth reliability analysis of two types of DTE hardware accelerators: classical and approximate implementations. Specifically, we conduct a comprehensive fault injection campaign, varying the number of trees involved in the classification task, the approximation technique used, and the tolerated accuracy loss, while evaluating several benchmark datasets. The results of this study demonstrate that approximation techniques have to be carefully designed, as they can significantly impact resilience. However, techniques that target the representation of features and thresholds appear to be better suited for fault tolerance.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108378"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146014882","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems 动态无人机辅助MEC系统的在线三维轨迹与资源优化
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.future.2026.108389
Zhao Tong , Shiyan Zhang , Jing Mei , Can Wang , Keqin Li
{"title":"Online 3D trajectory and resource optimization for dynamic UAV-assisted MEC systems","authors":"Zhao Tong ,&nbsp;Shiyan Zhang ,&nbsp;Jing Mei ,&nbsp;Can Wang ,&nbsp;Keqin Li","doi":"10.1016/j.future.2026.108389","DOIUrl":"10.1016/j.future.2026.108389","url":null,"abstract":"<div><div>The integration and development of unmanned aerial vehicles (UAVs) and mobile edge computing (MEC) technology provide users with more flexible, reliable, and high-quality computing services. However, most UAV-assisted MEC model designs mainly focus on static environments, which do not apply to the practical scenarios considered in this work. In this paper, we consider a UAV-assisted MEC platform, which can provide continuous services for multiple mobile ground users with random movements and task arrivals. Moreover, we investigate the long-term system utility maximization problem in UAV-assisted MEC systems, considering continuous task offloading, users’ mobility, UAV’s 3D trajectory control, and resource allocation. To address the challenges of limited system information, high-dimensional continuous actions, and state space approximation, we propose an <u>O</u>nline decision-making algorithm for <u>D</u>ynamic environments based on <u>E</u>xploration-enhanced <u>G</u>reedy <u>D</u>DPG (ODEGD). Additionally, to more accurately evaluate the algorithm’s performance, we introduced real-world roads into the experiment. Experimental results show that the proposed algorithm reduces response delay by 26.98% and energy consumption by 22.61% compared to other algorithms, while achieving the highest system utility. These results validate the applicability of the ODEGD algorithm under dynamic conditions, demonstrating its good robustness and scalability.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108389"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146047991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs 基于任务的多加速器api异构系统数据流方法
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-24 DOI: 10.1016/j.future.2026.108383
Aleix Boné , Alejandro Aguirre , David Álvarez , Pedro J. Martinez-Ferrer , Vicenç Beltran
{"title":"A task-based data-flow methodology for programming heterogeneous systems with multiple accelerator APIs","authors":"Aleix Boné ,&nbsp;Alejandro Aguirre ,&nbsp;David Álvarez ,&nbsp;Pedro J. Martinez-Ferrer ,&nbsp;Vicenç Beltran","doi":"10.1016/j.future.2026.108383","DOIUrl":"10.1016/j.future.2026.108383","url":null,"abstract":"<div><div>Heterogeneous nodes that combine multi-core CPUs with diverse accelerators are rapidly becoming the norm in both high-performance computing (HPC) and AI infrastructures. Exploiting these platforms, however, requires orchestrating several low-level accelerator APIs such as CUDA, SYCL, and Triton. In some occasions they can be combined with optimized vendor math libraries: e.g., cuBLAS and oneAPI. Each API or library introduces its own abstractions, execution semantics, and synchronization mechanisms. Combining them within a single application is therefore error-prone and labor-intensive. We propose reusing a task-based data-flow methodology together with Task-Aware APIs (TA-libs) to overcome these limitations and facilitate the seamless integration of multiple accelerator programming models, while still leveraging the best-in-class kernels offered by each API.</div><div>Applications are expressed as a directed acyclic graph (DAG) of host tasks and device kernels managed by an OpenMP/OmpSs-2 runtime. We introduce Task-Aware SYCL (TASYCL) and leverage Task-Aware CUDA (TACUDA), which elevate individual accelerator invocations to first-class tasks. When multiple native runtimes coexist on the same multi-core CPU, they contend for threads, leading to oversubscription and performance variability. To address this, we unify their thread management under the nOS-V tasking and threading library, to which we contribute a new port of the PoCL (Portable OpenCL) runtime.</div><div>The methodology is evaluated on a multi-core server and a GPU-accelerated node using two contrasting workloads: the GPT-2 pre-training phase, representative of modern AI pipelines, and the HPCCG conjugate-gradient benchmark, representative of traditional HPC. From a performance standpoint, monolithic-kernel and fork-join executions are comparable —in both execution time and memory footprint— to a coarse-grained task-based formulation on both GPU-accelerated and multi-core systems. On the latter, unifying all runtimes through nOS-V mitigates interference and delivers performance on par with using a single runtime in isolation.</div><div>These results demonstrate that task-aware libraries, coupled with the nOS-V library, enable a single application to harness multiple accelerator programming models transparently and efficiently. The proposed methodology is immediately applicable to current heterogeneous nodes and is readily extensible to future systems that integrate even richer combinations of CPUs, GPUs, FPGAs, and AI accelerators.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108383"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"146048039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin HP2C-DT:高精度高性能计算机数字孪生
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2025-12-26 DOI: 10.1016/j.future.2025.108333
E. Iraola , M. García-Lorenzo , F. Lordan-Gomis , F. Rossi , E. Prieto-Araujo , R.M. Badia
{"title":"HP2C-DT: High-Precision High-Performance Computer-enabled Digital Twin","authors":"E. Iraola ,&nbsp;M. García-Lorenzo ,&nbsp;F. Lordan-Gomis ,&nbsp;F. Rossi ,&nbsp;E. Prieto-Araujo ,&nbsp;R.M. Badia","doi":"10.1016/j.future.2025.108333","DOIUrl":"10.1016/j.future.2025.108333","url":null,"abstract":"<div><div>Digital twins are transforming the way we monitor, analyze, and control physical systems, but designing architectures that balance real-time responsiveness with heavy computational demands remains a challenge. Cloud-based solutions often struggle with latency and resource constraints, while edge-based approaches lack the processing power for complex simulations and data-driven optimizations.</div><div>To address this problem, we propose the <em>High-Precision High-Performance Computer-enabled Digital Twin</em> (HP2C-DT) reference architecture, which integrates High-Performance Computing (HPC) into the computing continuum. Unlike traditional setups that use HPC only for offline simulations, HP2C-DT makes it an active part of digital twin workflows, dynamically assigning tasks to edge, cloud, or HPC resources based on urgency and computational needs.</div><div>Furthermore, to bridge the gap between theory and practice, we introduce the HP2C-DT framework, a working implementation that uses COMPSs for seamless workload distribution across diverse infrastructures. We test it in a power grid use case, showing how it reduces communication bandwidth by an order of magnitude through edge-side data aggregation, improves response times by up to 2x via dynamic offloading, and maintains near-ideal strong scaling for compute-intensive workflows across a practical range of resources. These results demonstrate how an HPC-driven approach can push digital twins beyond their current limitations, making them smarter, faster, and more capable of handling real-world complexity.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108333"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145845127","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Striking the balance between speed and compression ratio: A fast bit-grouping algorithm and adaptive compressor selection for scientific data 在速度和压缩比之间取得平衡:科学数据的快速位分组算法和自适应压缩器选择
IF 6.2 2区 计算机科学
Future Generation Computer Systems-The International Journal of Escience Pub Date : 2026-07-01 Epub Date: 2026-01-10 DOI: 10.1016/j.future.2026.108370
Michael Middlezong
{"title":"Striking the balance between speed and compression ratio: A fast bit-grouping algorithm and adaptive compressor selection for scientific data","authors":"Michael Middlezong","doi":"10.1016/j.future.2026.108370","DOIUrl":"10.1016/j.future.2026.108370","url":null,"abstract":"<div><div>High-performance computing (HPC) systems have enabled unprecedented advancements in scientific simulation, producing larger and larger quantities of data to be analyzed. The resulting storage and I/O overheads present a significant bottleneck to scientific workflows. While many compression algorithms have been developed to address the issue, achieving the optimal balance between compression ratio and throughput remains a challenge. Furthermore, strict error bound requirements are inadequately addressed by current solutions. This paper introduces GRASP, a fast bit-grouping compressor that leverages the local smoothness of data to achieve high throughput while maintaining competitive compression ratios under tight error constraints. For the purposes of compressor selection, we also propose a novel efficiency metric that considers both compression and I/O performance, allowing the user to make an informed decision about which compressor to use. We also develop an adaptive compression selection framework based on this metric, using sampling to determine at runtime the optimal compressor for specific use cases. Experimental results across six diverse datasets demonstrate that GRASP outperforms traditional error-bounded compressors such as SZ3 and ZFP in speed while achieving similar compression ratios under tight error bounds. Additionally, we assess scenarios in which a naive compressor selection fails to select the optimal compressor, demonstrating the importance of an adaptive compressor selection framework. These contributions provide a practical approach to balancing speed and compression ratio in modern scientific data management.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"180 ","pages":"Article 108370"},"PeriodicalIF":6.2,"publicationDate":"2026-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"145957071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书