{"title":"Performance and efficiency: A multi-generational benchmark of modern processors on bandwidth-bound HPC applications","authors":"Balázs Drávai, István Z. Reguly","doi":"10.1016/j.future.2025.107793","DOIUrl":"10.1016/j.future.2025.107793","url":null,"abstract":"<div><div>The last two years has seen the launch of a multitude of new x86 processors, in reaction to market demand. Intel has launched four families of Xeon Processors, with some novel architectural features; first the Sapphire Rapids generation which featured a version with on-package HBM, the Emerald Rapids generation, and then differentiated by releasing the performance-oriented Granite Rapids and the efficiency-oriented Sierra Forest families. In this work, we evaluate the performance and energy efficiency of CPUs from each of different generations and variants of Intel and AMD CPUs, with a particular focus on bandwidth-bound high performance computing (HPC) applications. We contrast runtime and energy consumption figures and track trends across generations. We furthermore study how enabling locality-improving optimizations increases cache reuse and overall performance, while reducing energy use.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107793"},"PeriodicalIF":6.2,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143578089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"zCeph: Design and implementation of a ZNS-friendly distributed file system","authors":"Jin Yong Ha , Yongseok Son","doi":"10.1016/j.future.2025.107763","DOIUrl":"10.1016/j.future.2025.107763","url":null,"abstract":"<div><div>This article presents <span>zCeph</span>, a ZNS-friendly distributed file system designed to efficiently utilize zoned namespace (ZNS) SSDs. Specifically, we first propose <span>MZAllocator</span> which enables multiple zones to be utilized simultaneously to maximize the performance of ZNS SSDs. Second, we adopt an <span>append</span> command to eliminate the need for synchronization in write ordering within distributed storage systems to improve scalability. Third, we present <span>zBlueFS</span>, a ZNS-aware user-level file system based on BlueFS to update the metadata on the ZNS SSD without a conventional SSD. Finally, we propose a delta write technique, <span>DeltaWriter</span>, which writes only a modified part of the metadata (i.e., onode) to reduce read–modify–write overhead whenever the metadata are updated. We implement <span>zCeph</span> with four techniques based on Ceph, an open-source distributed file system. Further, we evaluate <span>zCeph</span> on a pair of 48-core machines with ZNS SSDs using micro and macro benchmarks, and the results reveal that <span>zCeph</span> improves performance by up to 4.2<span><math><mo>×</mo></math></span> and 8.8<span><math><mo>×</mo></math></span> compared with Ceph, respectively.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107763"},"PeriodicalIF":6.2,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143592145","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhousheng Wang , Jiahe Shen , Hua Dai , Jian Xu , Geng Yang , Hao Zhou
{"title":"Federated adaptive pruning with differential privacy","authors":"Zhousheng Wang , Jiahe Shen , Hua Dai , Jian Xu , Geng Yang , Hao Zhou","doi":"10.1016/j.future.2025.107783","DOIUrl":"10.1016/j.future.2025.107783","url":null,"abstract":"<div><div>Federated Learning (FL), as an emerging distributed machine learning technique, reduces the computational burden on the central server through decentralization, while ensuring data privacy. It typically requires client sampling and local training for each iteration, followed by aggregation of the model on a central server. Although this distributed learning approach has positive implications for the preservation of privacy, it also increases the computational load of local clients. Therefore, lightweight efficient schemes become an indispensable tool to help reduce communication and computational costs in FL. In addition, due to the risk of model stealing attacks when uploaded, it is urgent to improve the level of privacy protection further. In this paper, we propose Federated Adaptive Pruning (FAP), a lightweight method that integrates FL with adaptive pruning by adjusting explicit regularization. We keep the model unchanged, but instead try to dynamically prune the data from large datasets during the training process to reduce the computational costs and enhance privacy protection. In each round of training, selected clients train with their local data and prune a portion of the data before uploading the model for server-side aggregation. The remaining data are reserved for subsequent computations. With this approach, selected clients can quickly refine their data at the beginning of training. In addition, we combine FAP with differential privacy to further strengthen data privacy. Through comprehensive experiments, we demonstrate the performance of FAP on different datasets with basic models, <em>e.g.</em>, CNN, and MLP, just to mention a few. Numerous experimental results show that our method is able to significantly prune the datasets to reduce computational overhead with minimal loss of accuracy. Compared to previous methods, we can obtain the lowest training error, and further improve the data privacy of client-side.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107783"},"PeriodicalIF":6.2,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cross-platform and polyhedral programming for Nussinov RNA folding","authors":"Mateusz Gruzewski, Marek Palkowski","doi":"10.1016/j.future.2025.107786","DOIUrl":"10.1016/j.future.2025.107786","url":null,"abstract":"<div><div>This article addresses the use of codes from polyhedral compilers with tiled and parallel code designed for CPU processors, automatically generated as source-to-source OpenMP for NVIDIA GPU graphics cards using CUDA. In previous publications, we demonstrated that it is possible to use large language models (LLM) to translate code, generate kernels, and correctly manage memory transfers between the host and the device without manual effort. Unfortunately, when the target architecture is not taken into account in detail, the performance of code designed for CPUs leaves much to be desired when running on GPUs. The architectural differences between these two platforms like cores, cache, and the dimensionality of computations require careful attention to performance portability. In this article, we address the Nussinov algorithm, a popular benchmark in bioinformatics, to achieve higher performance on the NVIDIA platform than automatically generated codes by LLM. Nussinov’s loop nests are a non-trivial kernel from the non-serial polyadic dynamic programming (NPDP) benchmark with non-uniform loops. We will utilize a polyhedral code framework that tiles and then manually modifies the most nested loop nest containing the majority of the computations, using the two-dimensional thread blocks. To accelerate the computations, shared memory within blocks is utilized. The resulting codes were tested on two modern NVIDIA devices for various RNA sequence lengths, compared to parallel and tiled CPU codes, and previously generated Nussinov’s GPU codes using LLMs. The correctness of these codes and their scalability were analyzed. Comparison to related approaches and future work are outlined.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107786"},"PeriodicalIF":6.2,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143591462","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complex network knowledge-based field programmable gate arrays routing congestion prediction","authors":"Tingyuan Nie, Pengfei Liu, Kun Zhao, Zhenhao Wang","doi":"10.1016/j.future.2025.107776","DOIUrl":"10.1016/j.future.2025.107776","url":null,"abstract":"<div><div>Routing congestion occurs due to the unprecedented complexity of FPGA (field programmable gate array) designs. Accurately predicting the congestion of today’s FPGAs at an early stage helps to reduce the burden of later design and optimization. This paper proposes an innovative complex network knowledge-based approach to predict FPGA routing congestion during the placement stage. The complex network features and circuit features highly correlated with routing congestion are mapped into RGB (red-green-blue) images according to the pre-perceived importance of the features to feed to the proposed model. A patched EDM (elucidating the design space of a diffusion-based generative model) with a patch transformation is introduced to focus on the most significant features.</div><div>Experimental results show the remarkable achievements of the approach with an average SSIM (structural similarity) of 85.01 %, PSNR (peak signal-to-noise Ratio) of 27.85 dB (decibels), NRMS (normalized root mean square) of 12.91 %, and PIX (pixel accuracy) of 18.73 %, outperforming the recent state-of-the-art models like pix2pix, pix2pixHD, FCN (fully convolutional networks), and Lay-Net, improved by 4.87 %, 2.83 %, 5.77 %, and 18.56 % on key metric SSIM, respectively. The ablation validation highlights the efficiency of complex network features in routing congestion prediction. The outcome enables the identification of potential routing congestion in early design stages, facilitating the optimization solution of subsequent tractable routing problems.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107776"},"PeriodicalIF":6.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yongqing Zhang , Hao Yuan , Yuhang Liu , Shuwen Xiong , Zhigan Zhou , Yugui Xu , Xinyu Mao , Meiqin Gong
{"title":"MMGCSyn: Explainable synergistic drug combination prediction based on multimodal fusion","authors":"Yongqing Zhang , Hao Yuan , Yuhang Liu , Shuwen Xiong , Zhigan Zhou , Yugui Xu , Xinyu Mao , Meiqin Gong","doi":"10.1016/j.future.2025.107784","DOIUrl":"10.1016/j.future.2025.107784","url":null,"abstract":"<div><div>Synergistic drug combinations are an effective solution for treating complex diseases. The main challenge is to improve the model performance of the unknown drug combination prediction task. Due to some drugs in the dataset being wholly excluded, it is difficult for the model to effectively extract the data features of these drugs, affecting the model’s accuracy and generalization ability. Unlike previous methods, we propose an interpretable synergistic drug combination prediction model, MMGCSyn, based on multimodal feature fusion. The process is as follows: First, given any (drug, drug, cell line) triple. For drug features, a graph attention network is used to extract drug molecular graph features, a deformable convolutional network is used to extract drug morgan fingerprint features and the spatial feature reconstruction module is used to suppress morgan fingerprint feature redundancy. Multi-layer MLP is used to extract the features of cell line features. Subsequently, feature fusion and prediction are performed through Transformer. We compared five existing methods on three drug combination datasets. The results show that MMGCSyn has achieved the best results and can effectively capture the chemical substructures of drug molecules.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107784"},"PeriodicalIF":6.2,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548539","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"ARGO: Overcoming hardware dependence in distributed learning","authors":"Karim Boubouh , Amine Boussetta , Rachid Guerraoui , Alexandre Maurer","doi":"10.1016/j.future.2025.107778","DOIUrl":"10.1016/j.future.2025.107778","url":null,"abstract":"<div><div>Mobile devices offer a valuable resource for distributed learning alongside traditional computers, encouraging energy efficiency and privacy through local computations. However, the hardware limitations of these devices makes it impossible to use classical SGD for industry-grade machine learning models (with a very large number of parameters). Moreover, they are intermittently available and susceptible to failures. To address these challenges, we introduce <span>ARGO</span>, an algorithm that combines adaptive workload schemes with Byzantine resilience mechanisms, as well as dynamic device participation. Our theoretical analysis demonstrates linear convergence for strongly convex losses and sub-linear convergence for non-convex losses, without assuming specific dataset partitioning (for potential data heterogeneity). Our formal analysis highlights the interplay between convergence properties, hardware capabilities, Byzantine impact, and standard factors such as mini-batch size and learning rate. Through extensive evaluations, we show that <span>ARGO</span> outperforms standard SGD in terms of convergence speed and accuracy, and most importantly, thrives when classical SGD is not possible due to hardware limitations.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107778"},"PeriodicalIF":6.2,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548535","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Kan Zhong , Qiao Li , Ao Ren , Yujuan Tan , Xianzhang Chen , Linbo Long , Duo Liu
{"title":"PIM-IoT: Enabling hierarchical, heterogeneous, and agile Processing-in-Memory in IoT systems","authors":"Kan Zhong , Qiao Li , Ao Ren , Yujuan Tan , Xianzhang Chen , Linbo Long , Duo Liu","doi":"10.1016/j.future.2025.107782","DOIUrl":"10.1016/j.future.2025.107782","url":null,"abstract":"<div><div>The Internet of Things (IoT) is an emerging concept that senses the physical world by connecting various “things” and objects to the Internet. Conventional cloud-based IoT systems are unlikely to keep up with the diverse needs of IoT applications and have some issues, such as privacy and latency. Edge computing based IoT systems solve these issues by placing data processing and inference tasks near the data source. However, due to the increasing complexity of IoT applications, performing data processing and inference tasks in edge computing based IoT systems can lead to high energy consumption and latency.</div><div>Processing-in-Memory (PIM) is a promising solution to reduce the energy consumption of data processing and inference tasks by closely integrating computational logics with memory device. Therefore, in this paper, we propose <strong>PIM-IoT</strong>, a PIM architectures enabled IoT system to reduce the energy consumption. To accommodate various data processing tasks, we architect PIM-IoT as a hierarchical system that consists of 3 tiers: <em>sensing tier</em>, <em>gateway tier</em>, and <em>edge computing tier</em>. We first analyze the dataflow of typical IoT applications and map tasks to different tiers. To handle the data processing and inference tasks effectively in each tier, we then propose hierarchical, heterogeneous, and collaborative PIM architectures for each tier. Finally, we show how multi-tier can be co-optimized under latency and power constraints. To our knowledge, this is the first work to explore novel PIM architectures in IoT systems. Detailed analysis and experimental results show that PIM-IoT can achieve 5.6x performance improvement and 6x energy consumption reduction for IoT applications.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"169 ","pages":"Article 107782"},"PeriodicalIF":6.2,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143563729","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Manu Narula , Jasraj Meena , Dinesh Kumar Vishwakarma
{"title":"Federated workload-aware quantized framework for secure learning in data-sensitive applications","authors":"Manu Narula , Jasraj Meena , Dinesh Kumar Vishwakarma","doi":"10.1016/j.future.2025.107772","DOIUrl":"10.1016/j.future.2025.107772","url":null,"abstract":"<div><div>Federated Learning (FL) emerged as a leading secure, distributed learning technology based on sharing insights instead of data. The privacy-ensuring capability of FL has enabled its extensive use in Data-Sensitive Applications like healthcare and finance. However, the transmitted insights are at risk of leakage as the security of the medium cannot be guaranteed and can lead to the inference of the user data. Quantization is sometimes used to change these transmitted values to provide security but at the cost of accuracy loss in global models. Coupled with client dropouts, this increases performance loss. In this paper, we propose a Federated Workload-Aware Framework with Linear Quantization (Fed-WALQ), which layers the quantization process with an active client-selection technique based on the sustainable workload of the clients. The framework minimizes the dropout rates and compensates for the loss due to quantization. Through numerical experiments compared against traditional FL and Quantization-enabled FL over multiple datasets, the Fed-WALQ shows improvements in security over the former and accuracy over the latter. The accuracy improvement varies with the complexities of the involved datasets, while a substantial drop in straggler node percentages is seen in all cases (up to 91.8% drop).</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107772"},"PeriodicalIF":6.2,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yanxin Xu , Hua Zhang , Zhenyan Liu , Fei Gao , Lei Qiao
{"title":"VeriTrac: Verifiable and traceable cross-silo federated learning","authors":"Yanxin Xu , Hua Zhang , Zhenyan Liu , Fei Gao , Lei Qiao","doi":"10.1016/j.future.2025.107780","DOIUrl":"10.1016/j.future.2025.107780","url":null,"abstract":"<div><div>Cross-silo federated learning enables many clients to train a machine learning model collaboratively, while keeping the raw training data locally. It faces the risks of privacy leakage and malicious participants. In this paper, we introduce a new security risk that malicious clients may disrupt the training process of cross-silo federated learning by falsifying the verification evidences. The verification failure caused by this malicious behavior is not easily distinguishable from that caused by the malicious server falsifying the aggregated model. To address this issue, we design VeriTrac, the first privacy-preserving cross-silo federated learning scheme that supports verifiability and traceability. Before performing the aggregation, the server can utilize the non-private information of clients to verify messages submitted by them to avoid being framed. When the proportion of malicious clients is less than 50%, malicious participants causing the verification error can be traced. In addition, to verify the correctness of the aggregated models, a model vector with a verification factor is constructed and encrypted. The vector is confidential for the server, and the factor is part of the verification evidence and recoverable for clients. Security analysis shows that VeriTrac can guarantee the tracing of malicious participants and the data security of clients. Experimental evaluation shows that computation efficiency and communication efficiency of VeriTrac are acceptable.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"168 ","pages":"Article 107780"},"PeriodicalIF":6.2,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143548536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}