arXiv - CS - Performance最新文献_第5页

Toward Smart Scheduling in Tapis 在 Tapis 中实现智能调度

arXiv - CS - Performance Pub Date : 2024-08-05 DOI: arxiv-2408.03349

Joe Stubbs, Smruti Padhy, Richard Cardone

{"title":"Toward Smart Scheduling in Tapis","authors":"Joe Stubbs, Smruti Padhy, Richard Cardone","doi":"arxiv-2408.03349","DOIUrl":"https://doi.org/arxiv-2408.03349","url":null,"abstract":"The Tapis framework provides APIs for automating job execution on remote\u0000resources, including HPC clusters and servers running in the cloud. Tapis can\u0000simplify the interaction with remote cyberinfrastructure (CI), but the current\u0000services require users to specify the exact configuration of a job to run,\u0000including the system, queue, node count, and maximum run time, among other\u0000attributes. Moreover, the remote resources must be defined and configured in\u0000Tapis before a job can be submitted. In this paper, we present our efforts to\u0000develop an intelligent job scheduling capability in Tapis, where various\u0000attributes about a job configuration can be automatically determined for the\u0000user, and computational resources can be dynamically provisioned by Tapis for\u0000specific jobs. We develop an overall architecture for such a feature, which\u0000suggests a set of core challenges to be solved. Then, we focus on one such\u0000specific challenge: predicting queue times for a job on different HPC systems\u0000and queues, and we present two sets of results based on machine learning\u0000methods. Our first set of results cast the problem as a regression, which can\u0000be used to select the best system from a list of existing options. Our second\u0000set of results frames the problem as a classification, allowing us to compare\u0000the use of an existing system with a dynamically provisioned resource.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"18 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures 在共享内存 CPU 架构和 GPU 架构上解决大型缺阶线性最小二乘法问题

arXiv - CS - Performance Pub Date : 2024-08-05 DOI: arxiv-2408.05238

Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson

{"title":"Solving Large Rank-Deficient Linear Least-Squares Problems on Shared-Memory CPU Architectures and GPU Architectures","authors":"Mónica Chillarón, Gregorio Quintana-Ortí, Vicente Vidal, Per-Gunnar Martinsson","doi":"arxiv-2408.05238","DOIUrl":"https://doi.org/arxiv-2408.05238","url":null,"abstract":"Solving very large linear systems of equations is a key computational task in\u0000science and technology. In many cases, the coefficient matrix of the linear\u0000system is rank-deficient, leading to systems that may be underdetermined,\u0000inconsistent, or both. In such cases, one generally seeks to compute the least\u0000squares solution that minimizes the residual of the problem, which can be\u0000further defined as the solution with smallest norm in cases where the\u0000coefficient matrix has a nontrivial nullspace. This work presents several new\u0000techniques for solving least squares problems involving coefficient matrices\u0000that are so large that they do not fit in main memory. The implementations\u0000include both CPU and GPU variants. All techniques rely on complete orthogonal\u0000decompositions that guarantee that both conditions of a least squares solution\u0000are met, regardless of the rank properties of the matrix. Specifically, they\u0000rely on the recently proposed \"randUTV\" algorithm that is particularly\u0000effective in strongly communication-constrained environments. A detailed\u0000precision and performance study reveals that the new methods, that operate on\u0000data stored on disk, are competitive with state-of-the-art methods that store\u0000all data in main memory.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142225119","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Billion-files File Systems (BfFS): A Comparison 十亿文件系统（BfFS）：比较

arXiv - CS - Performance Pub Date : 2024-08-03 DOI: arxiv-2408.01805

Sohail Shaikh

引用次数: 0

GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS GPUDrive：每秒 100 万次的数据驱动多代理驾驶模拟

arXiv - CS - Performance Pub Date : 2024-08-02 DOI: arxiv-2408.01584

Saman Kazemkhani, Aarav Pandya, Daphne Cornelisse, Brennan Shacklett, Eugene Vinitsky

{"title":"GPUDrive: Data-driven, multi-agent driving simulation at 1 million FPS","authors":"Saman Kazemkhani, Aarav Pandya, Daphne Cornelisse, Brennan Shacklett, Eugene Vinitsky","doi":"arxiv-2408.01584","DOIUrl":"https://doi.org/arxiv-2408.01584","url":null,"abstract":"Multi-agent learning algorithms have been successful at generating superhuman\u0000planning in a wide variety of games but have had little impact on the design of\u0000deployed multi-agent planners. A key bottleneck in applying these techniques to\u0000multi-agent planning is that they require billions of steps of experience. To\u0000enable the study of multi-agent planning at this scale, we present GPUDrive, a\u0000GPU-accelerated, multi-agent simulator built on top of the Madrona Game Engine\u0000that can generate over a million steps of experience per second. Observation,\u0000reward, and dynamics functions are written directly in C++, allowing users to\u0000define complex, heterogeneous agent behaviors that are lowered to\u0000high-performance CUDA. We show that using GPUDrive we are able to effectively\u0000train reinforcement learning agents over many scenes in the Waymo Motion\u0000dataset, yielding highly effective goal-reaching agents in minutes for\u0000individual scenes and generally capable agents in a few hours. We ship these\u0000trained agents as part of the code base at\u0000https://github.com/Emerge-Lab/gpudrive.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"173 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940482","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Understanding and Enhancing Linux Kernel-based Packet Switching on WiFi Access Points 了解并增强 WiFi 接入点上基于 Linux 内核的数据包交换功能

arXiv - CS - Performance Pub Date : 2024-08-02 DOI: arxiv-2408.01013

Shiqi Zhang, Mridul Gupta, Behnam Dezfouli

{"title":"Understanding and Enhancing Linux Kernel-based Packet Switching on WiFi Access Points","authors":"Shiqi Zhang, Mridul Gupta, Behnam Dezfouli","doi":"arxiv-2408.01013","DOIUrl":"https://doi.org/arxiv-2408.01013","url":null,"abstract":"As the number of WiFi devices and their traffic demands continue to rise, the\u0000need for a scalable and high-performance wireless infrastructure becomes\u0000increasingly essential. Central to this infrastructure are WiFi Access Points\u0000(APs), which facilitate packet switching between Ethernet and WiFi interfaces.\u0000Despite APs' reliance on the Linux kernel's data plane for packet switching,\u0000the detailed operations and complexities of switching packets between Ethernet\u0000and WiFi interfaces have not been investigated in existing works. This paper\u0000makes the following contributions towards filling this research gap. Through\u0000macro and micro-analysis of empirical experiments, our study reveals insights\u0000in two distinct categories. Firstly, while the kernel's statistics offer\u0000valuable insights into system operations, we identify and discuss potential\u0000pitfalls that can severely affect system analysis. For instance, we reveal the\u0000implications of device drivers on the meaning and accuracy of the statistics\u0000related to packet-switching tasks and processor utilization. Secondly, we\u0000analyze the impact of the packet switching path and core configuration on\u0000performance and power consumption. Specifically, we identify the differences in\u0000Ethernet-to-WiFi and WiFi-to-Ethernet data paths regarding processing\u0000components, multi-core utilization, and energy efficiency. We show that the\u0000WiFi-to-Ethernet data path leverages better multi-core processing and exhibits\u0000lower power consumption.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"96 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141940484","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Age of Information Analysis for Multi-Priority Queue and NOMA Enabled C-V2X in IoV 物联网多优先队列和启用 NOMA 的 C-V2X 的信息时代分析

arXiv - CS - Performance Pub Date : 2024-08-01 DOI: arxiv-2408.00223

Zheng Zhang, Qiong Wu, Pingyi Fan, Ke Xiong

{"title":"Age of Information Analysis for Multi-Priority Queue and NOMA Enabled C-V2X in IoV","authors":"Zheng Zhang, Qiong Wu, Pingyi Fan, Ke Xiong","doi":"arxiv-2408.00223","DOIUrl":"https://doi.org/arxiv-2408.00223","url":null,"abstract":"As development Internet-of-Vehicles (IoV) technology and demand for\u0000Intelligent Transportation Systems (ITS) increase, there is a growing need for\u0000real-time data and communication by vehicle users. Traditional request-based\u0000methods face challenges such as latency and bandwidth limitations. Mode 4 in\u0000Connected Vehicle-to-Everything (C-V2X) addresses latency and overhead issues\u0000through autonomous resource selection. However, Semi-Persistent Scheduling\u0000(SPS) based on distributed sensing may lead to increased collision.\u0000Non-Orthogonal Multiple Access (NOMA) can alleviate the problem of reduced\u0000packet reception probability due to collisions. Moreover, the concept of Age of\u0000Information (AoI) is introduced as a comprehensive metric reflecting\u0000reliability and latency performance, analyzing the impact of NOMA on C-V2X\u0000communication system. AoI indicates the time a message spends in both local\u0000waiting and transmission processes. In C-V2X, waiting process can be extended\u0000to queuing process, influenced by packet generation rate and Resource\u0000Reservation Interval (RRI). The transmission process is mainly affected by\u0000transmission delay and success rate. In C-V2X, a smaller selection window (SW)\u0000limits the number of available resources for vehicles, resulting in higher\u0000collision rates with increased number of vehicles. SW is generally equal to\u0000RRI, which not only affects AoI in queuing process but also AoI in the\u0000transmission process. Therefore, this paper proposes an AoI estimation method\u0000based on multi-priority data type queues and considers the influence of NOMA on\u0000the AoI generated in both processes in C-V2X system under different RRI\u0000conditions. This work aims to gain a better performance of C-V2X system\u0000comparing with some known algorithms.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"173 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141880961","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Accelerating Transfer Function Update for Distance Map based Volume Rendering 加速基于距离图的体量渲染的传递函数更新

arXiv - CS - Performance Pub Date : 2024-07-31 DOI: arxiv-2407.21552

Michael Rauter, Lukas Zimmermann, Markus Zeilinger

引用次数: 0

In-Situ Techniques on GPU-Accelerated Data-Intensive Applications GPU 加速数据密集型应用的现场技术

arXiv - CS - Performance Pub Date : 2024-07-30 DOI: arxiv-2407.20731

Yi Ju, Mingshuai Li, Adalberto Perez, Laura Bellentani, Niclas Jansson, Stefano Markidis, Philipp Schlatter, Erwin Laure

{"title":"In-Situ Techniques on GPU-Accelerated Data-Intensive Applications","authors":"Yi Ju, Mingshuai Li, Adalberto Perez, Laura Bellentani, Niclas Jansson, Stefano Markidis, Philipp Schlatter, Erwin Laure","doi":"arxiv-2407.20731","DOIUrl":"https://doi.org/arxiv-2407.20731","url":null,"abstract":"The computational power of High-Performance Computing (HPC) systems is\u0000constantly increasing, however, their input/output (IO) performance grows\u0000relatively slowly, and their storage capacity is also limited. This unbalance\u0000presents significant challenges for applications such as Molecular Dynamics\u0000(MD) and Computational Fluid Dynamics (CFD), which generate massive amounts of\u0000data for further visualization or analysis. At the same time, checkpointing is\u0000crucial for long runs on HPC clusters, due to limited walltimes and/or failures\u0000of system components, and typically requires the storage of large amount of\u0000data. Thus, restricted IO performance and storage capacity can lead to\u0000bottlenecks for the performance of full application workflows (as compared to\u0000computational kernels without IO). In-situ techniques, where data is further\u0000processed while still in memory rather to write it out over the I/O subsystem,\u0000can help to tackle these problems. In contrast to traditional post-processing\u0000methods, in-situ techniques can reduce or avoid the need to write or read data\u0000via the IO subsystem. They offer a promising approach for applications aiming\u0000to leverage the full power of large scale HPC systems. In-situ techniques can\u0000also be applied to hybrid computational nodes on HPC systems consisting of\u0000graphics processing units (GPUs) and central processing units (CPUs). On one\u0000node, the GPUs would have significant performance advantages over the CPUs.\u0000Therefore, current approaches for GPU-accelerated applications often focus on\u0000maximizing GPU usage, leaving CPUs underutilized. In-situ tasks using CPUs to\u0000perform data analysis or preprocess data concurrently to the running\u0000simulation, offer a possibility to improve this underutilization.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141873316","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards an Integrated Performance Framework for Fire Science and Management Workflows 建立火灾科学和管理工作流程的综合绩效框架

arXiv - CS - Performance Pub Date : 2024-07-30 DOI: arxiv-2407.21231

H. Ahmed, R. Shende, I. Perez, D. Crawl, S. Purawat, I. Altintas

引用次数: 0

Understanding the Impact of Synchronous, Asynchronous, and Hybrid In-Situ Techniques in Computational Fluid Dynamics Applications 了解同步、异步和混合原位技术在计算流体动力学应用中的影响

arXiv - CS - Performance Pub Date : 2024-07-30 DOI: arxiv-2407.20717

Yi Ju, Adalberto Perez, Stefano Markidis, Philipp Schlatter, Erwin Laure

{"title":"Understanding the Impact of Synchronous, Asynchronous, and Hybrid In-Situ Techniques in Computational Fluid Dynamics Applications","authors":"Yi Ju, Adalberto Perez, Stefano Markidis, Philipp Schlatter, Erwin Laure","doi":"arxiv-2407.20717","DOIUrl":"https://doi.org/arxiv-2407.20717","url":null,"abstract":"High-Performance Computing (HPC) systems provide input/output (IO)\u0000performance growing relatively slowly compared to peak computational\u0000performance and have limited storage capacity. Computational Fluid Dynamics\u0000(CFD) applications aiming to leverage the full power of Exascale HPC systems,\u0000such as the solver Nek5000, will generate massive data for further processing.\u0000These data need to be efficiently stored via the IO subsystem. However, limited\u0000IO performance and storage capacity may result in performance, and thus\u0000scientific discovery, bottlenecks. In comparison to traditional post-processing\u0000methods, in-situ techniques can reduce or avoid writing and reading the data\u0000through the IO subsystem, promising to be a solution to these problems. In this\u0000paper, we study the performance and resource usage of three in-situ use cases:\u0000data compression, image generation, and uncertainty quantification. We\u0000furthermore analyze three approaches when these in-situ tasks and the\u0000simulation are executed synchronously, asynchronously, or in a hybrid manner.\u0000In-situ compression can be used to reduce the IO time and storage requirements\u0000while maintaining data accuracy. Furthermore, in-situ visualization and\u0000analysis can save Terabytes of data from being routed through the IO subsystem\u0000to storage. However, the overall efficiency is crucially dependent on the\u0000characteristics of both, the in-situ task and the simulation. In some cases,\u0000the overhead introduced by the in-situ tasks can be substantial. Therefore, it\u0000is essential to choose the proper in-situ approach, synchronous, asynchronous,\u0000or hybrid, to minimize overhead and maximize the benefits of concurrent\u0000execution.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"77 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141870321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0