{"title":"Advancing Hybrid Defense for Byzantine Attacks in Federated Learning","authors":"Kai Yue, Richeng Jin, Chau-Wai Wong, Huaiyu Dai","doi":"arxiv-2409.06474","DOIUrl":"https://doi.org/arxiv-2409.06474","url":null,"abstract":"Federated learning (FL) enables multiple clients to collaboratively train a\u0000global model without sharing their local data. Recent studies have highlighted\u0000the vulnerability of FL to Byzantine attacks, where malicious clients send\u0000poisoned updates to degrade model performance. Notably, many attacks have been\u0000developed targeting specific aggregation rules, whereas various defense\u0000mechanisms have been designed for dedicated threat models. This paper studies\u0000the resilience of an attack-agnostic FL scenario, where the server lacks prior\u0000knowledge of both the attackers' strategies and the number of malicious clients\u0000involved. We first introduce a hybrid defense against state-of-the-art attacks.\u0000Our goal is to identify a general-purpose aggregation rule that performs well\u0000on average while also avoiding worst-case vulnerabilities. By adaptively\u0000selecting from available defenses, we demonstrate that the server remains\u0000robust even when confronted with a substantial proportion of poisoned updates.\u0000To better understand this resilience, we then assess the attackers' capability\u0000using a proxy called client heterogeneity. We also emphasize that the existing\u0000FL defenses should not be regarded as secure, as demonstrated through the newly\u0000proposed Trapsetter attack. The proposed attack outperforms other\u0000state-of-the-art attacks by further reducing the model test accuracy by 8-10%.\u0000Our findings highlight the ongoing need for the development of\u0000Byzantine-resilient aggregation algorithms in FL.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"50 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211093","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal Workload Placement on Multi-Instance GPUs","authors":"Bekir Turkkan, Pavankumar Murali, Pavithra Harsha, Rohan Arora, Gerard Vanloo, Chandra Narayanaswami","doi":"arxiv-2409.06646","DOIUrl":"https://doi.org/arxiv-2409.06646","url":null,"abstract":"There is an urgent and pressing need to optimize usage of Graphical\u0000Processing Units (GPUs), which have arguably become one of the most expensive\u0000and sought after IT resources. To help with this goal, several of the current\u0000generation of GPUs support a partitioning feature, called Multi-Instance GPU\u0000(MIG) to allow multiple workloads to share a GPU, albeit with some constraints.\u0000In this paper we investigate how to optimize the placement of Large Language\u0000Model (LLM)-based AI Inferencing workloads on GPUs. We first identify and\u0000present several use cases that are encountered in practice that require\u0000workloads to be efficiently placed or migrated to other GPUs to make room for\u0000incoming workloads. The overarching goal is to use as few GPUs as possible and\u0000to further minimize memory and compute wastage on GPUs that are utilized. We\u0000have developed two approaches to address this problem: an optimization method\u0000and a heuristic method. We benchmark these with two workload scheduling\u0000heuristics for multiple use cases. Our results show up to 2.85x improvement in\u0000the number of GPUs used and up to 70% reduction in GPU wastage over baseline\u0000heuristics. We plan to enable the SRE community to leverage our proposed method\u0000in production environments.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"410 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211151","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Ares de Parga, J. R. Bravo, N. Sibuet, J. A. Hernandez, R. Rossi, Stefan Boschert, Enrique S. Quintana-Ortí, Andrés E. Tomás, Cristian Cătălin Tatu, Fernando Vázquez-Novoa, Jorge Ejarque, Rosa M. Badia
{"title":"Parallel Reduced Order Modeling for Digital Twins using High-Performance Computing Workflows","authors":"S. Ares de Parga, J. R. Bravo, N. Sibuet, J. A. Hernandez, R. Rossi, Stefan Boschert, Enrique S. Quintana-Ortí, Andrés E. Tomás, Cristian Cătălin Tatu, Fernando Vázquez-Novoa, Jorge Ejarque, Rosa M. Badia","doi":"arxiv-2409.09080","DOIUrl":"https://doi.org/arxiv-2409.09080","url":null,"abstract":"The integration of Reduced Order Models (ROMs) with High-Performance\u0000Computing (HPC) is critical for developing digital twins, particularly for\u0000real-time monitoring and predictive maintenance of industrial systems. This\u0000paper describes a comprehensive, HPC-enabled workflow for developing and\u0000deploying projection-based ROMs (PROMs). We use PyCOMPSs' parallel framework to\u0000efficiently execute ROM training simulations, employing parallel Singular Value\u0000Decomposition (SVD) algorithms such as randomized SVD, Lanczos SVD, and full\u0000SVD based on Tall-Skinny QR. In addition, we introduce a partitioned version of\u0000the hyper-reduction scheme known as the Empirical Cubature Method. Despite the\u0000widespread use of HPC for PROMs, there is a significant lack of publications\u0000detailing comprehensive workflows for building and deploying end-to-end PROMs\u0000in HPC environments. Our workflow is validated through a case study focusing on\u0000the thermal dynamics of a motor. The PROM is designed to deliver a real-time\u0000prognosis tool that could enable rapid and safe motor restarts post-emergency\u0000shutdowns under different operating conditions for further integration into\u0000digital twins or control systems. To facilitate deployment, we use the HPC\u0000Workflow as a Service strategy and Functional Mock-Up Units to ensure\u0000compatibility and ease of integration across HPC, edge, and cloud environments.\u0000The outcomes illustrate the efficacy of combining PROMs and HPC, establishing a\u0000precedent for scalable, real-time digital twin applications across multiple\u0000industries.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"40 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"NeurLZ: On Systematically Enhancing Lossy Compression Performance for Scientific Data based on Neural Learning with Error Control","authors":"Wenqi Jia, Youyuan Liu, Zhewen Hu, Jinzhen Wang, Boyuan Zhang, Wei Niu, Junzhou Huang, Stavros Kalafatis, Sian Jin, Miao Yin","doi":"arxiv-2409.05785","DOIUrl":"https://doi.org/arxiv-2409.05785","url":null,"abstract":"Large-scale scientific simulations generate massive datasets that pose\u0000significant challenges for storage and I/O. While traditional lossy compression\u0000techniques can improve performance, balancing compression ratio, data quality,\u0000and throughput remains difficult. To address this, we propose NeurLZ, a novel\u0000cross-field learning-based and error-controlled compression framework for\u0000scientific data. By integrating skipping DNN models, cross-field learning, and\u0000error control, our framework aims to substantially enhance lossy compression\u0000performance. Our contributions are three-fold: (1) We design a lightweight\u0000skipping model to provide high-fidelity detail retention, further improving\u0000prediction accuracy. (2) We adopt a cross-field learning approach to\u0000significantly improve data prediction accuracy, resulting in a substantially\u0000improved compression ratio. (3) We develop an error control approach to provide\u0000strict error bounds according to user requirements. We evaluated NeurLZ on\u0000several real-world HPC application datasets, including Nyx (cosmological\u0000simulation), Miranda (large turbulence simulation), and Hurricane (weather\u0000simulation). Experiments demonstrate that our framework achieves up to a 90%\u0000relative reduction in bit rate under the same data distortion, compared to the\u0000best existing approach.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"12 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Marcel Gregoriadis, Leonhard Balduf, Björn Scheuermann, Johan Pouwelse
{"title":"A Thorough Investigation of Content-Defined Chunking Algorithms for Data Deduplication","authors":"Marcel Gregoriadis, Leonhard Balduf, Björn Scheuermann, Johan Pouwelse","doi":"arxiv-2409.06066","DOIUrl":"https://doi.org/arxiv-2409.06066","url":null,"abstract":"Data deduplication emerged as a powerful solution for reducing storage and\u0000bandwidth costs by eliminating redundancies at the level of chunks. This has\u0000spurred the development of numerous Content-Defined Chunking (CDC) algorithms\u0000over the past two decades. Despite advancements, the current state-of-the-art\u0000remains obscure, as a thorough and impartial analysis and comparison is\u0000lacking. We conduct a rigorous theoretical analysis and impartial experimental\u0000comparison of several leading CDC algorithms. Using four realistic datasets, we\u0000evaluate these algorithms against four key metrics: throughput, deduplication\u0000ratio, average chunk size, and chunk-size variance. Our analyses, in many\u0000instances, extend the findings of their original publications by reporting new\u0000results and putting existing ones into context. Moreover, we highlight\u0000limitations that have previously gone unnoticed. Our findings provide valuable\u0000insights that inform the selection and optimization of CDC algorithms for\u0000practical applications in data deduplication.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"58 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211091","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Model Input Verification of Large Scale Simulations","authors":"Rumyana Neykova, Derek Groen","doi":"arxiv-2409.05768","DOIUrl":"https://doi.org/arxiv-2409.05768","url":null,"abstract":"Reliable simulations are critical for analyzing and understanding complex\u0000systems, but their accuracy depends on correct input data. Incorrect inputs\u0000such as invalid or out-of-range values, missing data, and format\u0000inconsistencies can cause simulation crashes or unnoticed result distortions,\u0000ultimately undermining the validity of the conclusions. This paper presents a\u0000methodology for verifying the validity of input data in simulations, a process\u0000we term model input verification (MIV). We implement this approach in FabGuard,\u0000a toolset that uses established data schema and validation tools for the\u0000specific needs of simulation modeling. We introduce a formalism for\u0000categorizing MIV patterns and offer a streamlined verification pipeline that\u0000integrates into existing simulation workflows. FabGuard's applicability is\u0000demonstrated across three diverse domains: conflict-driven migration, disaster\u0000evacuation, and disease spread models. We also explore the use of Large\u0000Language Models (LLMs) for automating constraint generation and inference. In a\u0000case study with a migration simulation, LLMs not only correctly inferred 22 out\u0000of 23 developer-defined constraints, but also identified errors in existing\u0000constraints and proposed new, valid constraints. Our evaluation demonstrates\u0000that MIV is feasible on large datasets, with FabGuard efficiently processing\u000012,000 input files in 140 seconds and maintaining consistent performance across\u0000varying file sizes.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"15 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211092","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Shuangwei Gao, Peng Yang, Yuxin Kong, Feng Lyu, Ning Zhang
{"title":"Joint Model Assignment and Resource Allocation for Cost-Effective Mobile Generative Services","authors":"Shuangwei Gao, Peng Yang, Yuxin Kong, Feng Lyu, Ning Zhang","doi":"arxiv-2409.09072","DOIUrl":"https://doi.org/arxiv-2409.09072","url":null,"abstract":"Artificial Intelligence Generated Content (AIGC) services can efficiently\u0000satisfy user-specified content creation demands, but the high computational\u0000requirements pose various challenges to supporting mobile users at scale. In\u0000this paper, we present our design of an edge-enabled AIGC service provisioning\u0000system to properly assign computing tasks of generative models to edge servers,\u0000thereby improving overall user experience and reducing content generation\u0000latency. Specifically, once the edge server receives user requested task\u0000prompts, it dynamically assigns appropriate models and allocates computing\u0000resources based on features of each category of prompts. The generated contents\u0000are then delivered to users. The key to this system is a proposed probabilistic\u0000model assignment approach, which estimates the quality score of generated\u0000contents for each prompt based on category labels. Next, we introduce a\u0000heuristic algorithm that enables adaptive configuration of both generation\u0000steps and resource allocation, according to the various task requests received\u0000by each generative model on the edge.Simulation results demonstrate that the\u0000designed system can effectively enhance the quality of generated content by up\u0000to 4.7% while reducing response delay by up to 39.1% compared to benchmarks.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"24 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263654","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arturo Gonzalez-EscribanoUniversidad de Valladolid, Spain, Diego García-ÁlvarezUniversidad de Valladolid, Spain, Jesús CámaraUniversidad de Valladolid, Spain
{"title":"DNA sequence alignment: An assignment for OpenMP, MPI, and CUDA/OpenCL","authors":"Arturo Gonzalez-EscribanoUniversidad de Valladolid, Spain, Diego García-ÁlvarezUniversidad de Valladolid, Spain, Jesús CámaraUniversidad de Valladolid, Spain","doi":"arxiv-2409.06075","DOIUrl":"https://doi.org/arxiv-2409.06075","url":null,"abstract":"We present an assignment for a full Parallel Computing course. Since\u00002017/2018, we have proposed a different problem each academic year to\u0000illustrate various methodologies for approaching the same computational problem\u0000using different parallel programming models. They are designed to be\u0000parallelized using shared-memory programming with OpenMP, distributed-memory\u0000programming with MPI, and GPU programming with CUDA or OpenCL. The problem\u0000chosen for this year implements a brute-force solution for exact DNA sequence\u0000alignment of multiple patterns. The program searches for exact coincidences of\u0000multiple nucleotide strings in a long DNA sequence. The sequential\u0000implementation is designed to be clear and understandable to students while\u0000offering many opportunities for parallelization and optimization. This\u0000assignment addresses key concepts many students find difficult to apply in\u0000practical scenarios: race conditions, reductions, collective operations, and\u0000point-to-point communications. It also covers the problem of parallel\u0000generation of pseudo-random sequences and strategies to notify and stop\u0000speculative computations when matches are found. This assignment serves as an\u0000exercise that reinforces basic knowledge and prepares students for more complex\u0000parallel computing concepts and structures. It has been successfully\u0000implemented as a practical assignment in a Parallel Computing course in the\u0000third year of a Computer Engineering degree program. Supporting materials for\u0000this and previous assignments in this series are publicly available.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"80 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wangsong Yin, Rongjie Yi, Daliang Xu, Gang Huang, Mengwei Xu, Xuanzhe Liu
{"title":"ELMS: Elasticized Large Language Models On Mobile Devices","authors":"Wangsong Yin, Rongjie Yi, Daliang Xu, Gang Huang, Mengwei Xu, Xuanzhe Liu","doi":"arxiv-2409.09071","DOIUrl":"https://doi.org/arxiv-2409.09071","url":null,"abstract":"On-device Large Language Models (LLMs) are revolutionizing mobile AI,\u0000enabling applications such as UI automation while addressing privacy concerns.\u0000Currently, the standard approach involves deploying a single, robust LLM as a\u0000universal solution for various applications, often referred to as\u0000LLM-as-a-Service (LLMaaS). However, this approach faces a significant system\u0000challenge: existing LLMs lack the flexibility to accommodate the diverse\u0000Service-Level Objectives (SLOs) regarding inference latency across different\u0000applications. To address this issue, we introduce ELMS, an on-device LLM\u0000service designed to provide elasticity in both the model and prompt dimensions\u0000of an LLMaaS. This system includes: A one-time neuron reordering technique,\u0000which utilizes the inherent permutation consistency within transformer models\u0000to create high-quality, elastic sub-models with minimal runtime switching\u0000costs. A dual-head compact language model, which efficiently refines prompts\u0000and coordinates the elastic adaptation between the model and the prompt. We\u0000have implemented this elastic on-device LLM service on several off-the-shelf\u0000(COTS) smartphones and evaluate ELMS using both standalone NLP/mobile-agent\u0000datasets and synthesized end-to-end traces. Across a range of SLOs, ELMS\u0000surpasses four strong baselines by up to 16.83% and 11.04% in absolute accuracy\u0000on average, with less than 1% Time-To-First-Token (TTFT) switching overhead,\u0000comparable memory usage, and fewer than 100 offline GPU hours.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"31 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142263657","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jingfeng Wu, Minxian Xu, Yiyuan He, Kejiang Ye, Chengzhong Xu
{"title":"CloudNativeSim: a toolkit for modeling and simulation of cloud-native applications","authors":"Jingfeng Wu, Minxian Xu, Yiyuan He, Kejiang Ye, Chengzhong Xu","doi":"arxiv-2409.05093","DOIUrl":"https://doi.org/arxiv-2409.05093","url":null,"abstract":"Cloud-native applications are increasingly becoming popular in modern\u0000software design. Employing a microservice-based architecture into these\u0000applications is a prevalent strategy that enhances system availability and\u0000flexibility. However, cloud-native applications also introduce new challenges,\u0000such as frequent inter-service communication and the complexity of managing\u0000heterogeneous codebases and hardware, resulting in unpredictable complexity and\u0000dynamism. Furthermore, as applications scale, only limited research teams or\u0000enterprises possess the resources for large-scale deployment and testing, which\u0000impedes progress in the cloud-native domain. To address these challenges, we\u0000propose CloudNativeSim, a simulator for cloud-native applications with a\u0000microservice-based architecture. CloudNativeSim offers several key benefits:\u0000(i) comprehensive and dynamic modeling for cloud-native applications, (ii) an\u0000extended simulation framework with new policy interfaces for scheduling\u0000cloud-native applications, and (iii) support for customized application\u0000scenarios and user feedback based on Quality of Service (QoS) metrics.\u0000CloudNativeSim can be easily deployed on standard computers to manage a high\u0000volume of requests and services. Its performance was validated through a case\u0000study, demonstrating higher than 94.5% accuracy in terms of response time. The\u0000study further highlights the feasibility of CloudNativeSim by illustrating the\u0000effects of various scaling policies.","PeriodicalId":501422,"journal":{"name":"arXiv - CS - Distributed, Parallel, and Cluster Computing","volume":"14 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142211096","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}