{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00116-9","DOIUrl":"10.1016/S0743-7315(25)00116-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105149"},"PeriodicalIF":3.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144604858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DH_Aligner: A fast short-read aligner on multicore platforms with AVX vectorization","authors":"Qiao Sun , Feng Chen , Leisheng Li , Huiyuan Li","doi":"10.1016/j.jpdc.2025.105142","DOIUrl":"10.1016/j.jpdc.2025.105142","url":null,"abstract":"<div><div>The rapid development of the NGS (Next-Generation Sequencing) technology leads to massive genome data produced at a much higher throughput than before, which leads to great demand for downstream fast and accurate genetic analysis. As one of the first steps of bio-informatical work-flow, read alignment makes an educated guess on where and how a read is mapped to a given reference sequence. In this paper, we propose DH_Aligner, a fast and accurate short read aligner designed and optimized for x86 multi-core platforms with <span>avx2/avx512</span> SIMD instruction sets. It is based on a three-phased aligning work-flow: seeding-filtering-extension and provides an end-to-end solution for read alignment from <span>Fastq</span> to <span>SAM</span> files. Due to a fast seeding scheme and a seed filtering procedure, DH_Aligner can avoid both of a time-consuming seeding phase and redundant workload of aligning reads at seemingly wrong locations. With the introduction of batched-processing methodology, parallelism is easily exploited at data-, instruction- and thread-level. The performance-critical kernels in DH_Aligner are implemented by both <span>avx2</span> and <span>avx512</span> intrinsics for a better performance and portability. On two typical x86 based platforms: Intel Xeon-6154 and Hygon C86-7285, DH_Aligner can produce a near-best accuracy/sensitivity while outperform state-of-the-art parallel implementations with average speedup: 7.8x, 3.4x, 2.8x-6.7x and 1.5x over bwa-mem, bwa-mem2, bowtie2 and minimap2 respectively.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105142"},"PeriodicalIF":3.4,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Janaina Schwarzrock , Hiago Mayk G. de A. Rocha , Arthur F. Lorenzon , Samuel Xavier de Souza , Antonio Carlos S. Beck
{"title":"Integration framework for online thread throttling with thread and page mapping on NUMA systems","authors":"Janaina Schwarzrock , Hiago Mayk G. de A. Rocha , Arthur F. Lorenzon , Samuel Xavier de Souza , Antonio Carlos S. Beck","doi":"10.1016/j.jpdc.2025.105145","DOIUrl":"10.1016/j.jpdc.2025.105145","url":null,"abstract":"<div><div>Non-Uniform Memory Access (NUMA) systems are prevalent in HPC, where optimal thread-to-core allocation and page placement are crucial for enhancing performance and minimizing energy usage. Moreover, considering that NUMA systems have hardware support for a large number of hardware threads and many parallel applications have limited scalability, artificially decreasing the number of threads by using Dynamic Concurrency Throttling (DCT) may bring further improvements. However, the optimal configuration (thread mapping, page mapping, number of threads) for energy and performance, quantified by the Energy-Delay Product (EDP), varies with the system hardware, application and input set, even during execution. Because of this dynamic nature, adaptability is essential, making offline strategies much less effective. Despite their effectiveness, online strategies introduce additional execution overhead, which involves learning at run-time and the cost of transitions between configurations with cache warm-ups, thread and data reallocation. Thus, balancing the learning time and solution quality becomes increasingly significant. In this scenario, this work proposes a framework to find such optimal configurations into a single, online, and efficient approach. Our experimental evaluation shows that our framework improves EDP and performance compared to online state-of-the-art techniques of thread/page mapping (up to 69.3% and 43.4%) and DCT (up to 93.2% and 74.9%), while being totally adaptive and requiring minimum user intervention.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105145"},"PeriodicalIF":3.4,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Philippe Leleux , Christina Schwarz , Martin J. Kühn , Carola Kruse , Ulrich Rüde
{"title":"Complexity analysis and scalability of a matrix-free extrapolated geometric multigrid solver for curvilinear coordinates representations from fusion plasma applications","authors":"Philippe Leleux , Christina Schwarz , Martin J. Kühn , Carola Kruse , Ulrich Rüde","doi":"10.1016/j.jpdc.2025.105143","DOIUrl":"10.1016/j.jpdc.2025.105143","url":null,"abstract":"<div><div>Tokamak fusion reactors are promising alternatives for future energy production. Gyrokinetic simulations are important tools to understand physical processes inside tokamaks and to improve the design of future plants. In gyrokinetic codes such as Gysela, these simulations involve at each time step the solution of a gyrokinetic Poisson equation defined on disk-like cross sections. The authors of <span><span>[14]</span></span>, <span><span>[15]</span></span> proposed to discretize a simplified differential equation using symmetric finite differences derived from the resulting energy functional and to use an implicitly extrapolated geometric multigrid scheme tailored to problems in curvilinear coordinates. In this article, we extend the discretization to a more realistic partial differential equation and demonstrate the optimal linear complexity of the proposed solver, in terms of computation and memory. We provide a general framework to analyze floating point operations and memory usage of matrix-free approaches for stencil-based operators. Finally, we give an efficient matrix-free implementation for the considered solver exploiting a task-based multithreaded parallelism which takes advantage of the disk-shaped geometry of the problem. We demonstrate the parallel efficiency for the solution of problems of size up to 50 million unknowns.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105143"},"PeriodicalIF":3.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Towards efficient program execution on edge-cloud computing platforms","authors":"Jean-François Dollinger, Vincent Vauchey","doi":"10.1016/j.jpdc.2025.105135","DOIUrl":"10.1016/j.jpdc.2025.105135","url":null,"abstract":"<div><div>This paper investigates techniques dedicated to the performance of edge-cloud infrastructures and identifies the challenges to address to maximize their efficiency. Unlike traditional cloud-only processing, edge-cloud platforms meet the stringent requirements of real-time applications via additional computing resources close to the data source. Yet, due to numerous performance factors, it is a complex task to perform efficient computations on such platforms. Thus, we identify the main performance bottlenecks induced by traditional approaches and extensively discuss the performance characteristics of edge computing platforms. Based on these insights, we design an automated framework capable of achieving end-to-end efficacy of edge-cloud applications. We argue that achieving performance on edge-cloud infrastructures requires adaptive offloading of programs based on computational requirements. Thus, we comprehensively study three performance-critical aspects forming the performance workflow of applications: i) performance modelling, ii) program optimization iii) task scheduling. First, we explore performance modelling techniques, forming the foundation of most cost models, to accurately predict and achieve robust code optimization and scheduling. We then cover the whole program optimization chain, from hotspot detection to code optimization, focusing on memory locality, code parallelization, and acceleration. Finally, we discuss task scheduling techniques for selecting the best computing resource and ensuring a balanced workload distribution. Overall, our study provides insights by covering the above performance workflow referencing prominent state-of-the-art works, particularly focusing on those not yet applied in the context of edge-cloud computing. Additionally, we conducted experiments to further validate our findings. Finally, for each topic of interest, we identify the addressed scientific obstacles and outline the open research challenges yet to be overcome.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105135"},"PeriodicalIF":3.4,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hantao Xiong , Wangdong Yang , Weiqing He , Shengle Lin , Keqin Li , Kenli Li
{"title":"MM-AutoSolver: A multimodal machine learning method for the auto-selection of iterative solvers and preconditioners","authors":"Hantao Xiong , Wangdong Yang , Weiqing He , Shengle Lin , Keqin Li , Kenli Li","doi":"10.1016/j.jpdc.2025.105144","DOIUrl":"10.1016/j.jpdc.2025.105144","url":null,"abstract":"<div><div>The solution of large-scale sparse linear systems of the form <span><math><mi>A</mi><mi>x</mi><mo>=</mo><mi>b</mi></math></span> is an important research problem in the field of High-performance Computing (HPC). With the increasing scale of these systems and the development of both HPC software and hardware, iterative solvers along with appropriate preconditioners have become mainstream methods for efficiently solving these sparse linear systems that arise from real-world HPC applications. Among abundant combinations of iterative solvers and preconditioners, the automatic selection of the optimal one has become a vital problem for accelerating the solution of these sparse linear systems. Previous work has utilized machine learning or deep learning algorithms to tackle this problem, but fails to abstract and exploit sufficient features from sparse linear systems, thus unable to obtain satisfactory results. In this work, we propose to address the automatic selection of the optimal combination of iterative solvers and preconditioners through the powerful multimodal machine learning framework, in which features of different modalities can be fully extracted and utilized to improve the results. Based on the multimodal machine learning framework, we put forward a multimodal machine learning model called MM-AutoSolver for the auto-selection of the optimal combination for a given sparse linear system. The experimental results based on a new large-scale matrix collection showcase that the proposed MM-AutoSolver outperforms state-of-the-art methods in predictive performance and has the capability to significantly accelerate the solution of large-scale sparse linear systems in HPC applications.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105144"},"PeriodicalIF":3.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144536100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hajer Ayadi , Aijun An , Yiming Shao , Hossein Pourmedheji , Junjie Deng , Jimmy X. Huang , Michael Feiman , Hao Zhou
{"title":"Topology-aware GPU job scheduling with deep reinforcement learning and heuristics","authors":"Hajer Ayadi , Aijun An , Yiming Shao , Hossein Pourmedheji , Junjie Deng , Jimmy X. Huang , Michael Feiman , Hao Zhou","doi":"10.1016/j.jpdc.2025.105138","DOIUrl":"10.1016/j.jpdc.2025.105138","url":null,"abstract":"<div><div>Deep neural networks (DNNs) have gained popularity in many fields such as computer vision, and natural language processing. However, the increasing size of data and complexity of models have made training DNNs time-consuming. While distributed DNN training using multiple GPUs in parallel is a common solution, it introduces challenges in GPU resource management and scheduling. One key challenge is minimizing communication costs among GPUs assigned to a DNN training job. High communication costs—arising from factors such as inter-rack or inter-machine data transfers—can lead to hardware bottlenecks and network delays, ultimately slowing down training. Reducing these costs facilitates more efficient data transfer and synchronization, directly accelerating the training process. Although deep reinforcement learning (DRL) has shown promise in GPU resource scheduling, existing methods often lack considerations for hardware topology. Moreover, most proposed GPU schedulers ignore the possibility of combining heuristic and DRL policies. In response to these challenges, we introduce <span><math><mi>T</mi><mi>o</mi><mi>p</mi><mi>D</mi><mi>R</mi><mi>L</mi></math></span>, an innovative hybrid scheduler that integrates deep reinforcement learning (DRL) and heuristic methods to enhance GPU job scheduling. <span><math><mi>T</mi><mi>o</mi><mi>p</mi><mi>D</mi><mi>R</mi><mi>L</mi></math></span> uses a multi-branch convolutional neural network (CNN) model for job selection and a heuristic method for GPU allocation. At each time step, the CNN model selects a job, and then a heuristic method selects available GPUs closest to each other from the cluster. Reinforcement learning (RL) is used to train the CNN model to select the job that maximizes throughput-based rewards. Extensive evaluation, conducted on datasets with real jobs, shows that <span><math><mi>T</mi><mi>o</mi><mi>p</mi><mi>D</mi><mi>R</mi><mi>L</mi></math></span> significantly outperforms six baseline schedulers that use heuristics or other DRL models for job picking and resource allocation.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105138"},"PeriodicalIF":3.4,"publicationDate":"2025-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144518829","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
S. Prabhu , T. Jenifer Janany , M. Arulperumjothi , I.G. Yero
{"title":"Edge metric basis and its fault tolerance over certain interconnection networks","authors":"S. Prabhu , T. Jenifer Janany , M. Arulperumjothi , I.G. Yero","doi":"10.1016/j.jpdc.2025.105141","DOIUrl":"10.1016/j.jpdc.2025.105141","url":null,"abstract":"<div><div>The surveillance of elements in an interconnection network is a classical problem in computer engineering. In addition, it is a problem closely related to uniquely identifying the elements of the network, which is indeed a classical distance-related problem in graph theory. This surveillance can be considered for different styles of elements in the network. The classical version centers the attention on the nodes, while some recent variations of it consider monitoring also the edges or both, vertices and edges at the same time. The first style gave rise to graph structures, called edge resolving set and edge metric basis, which is used to uniquely identify the edges of a given network by means of distance vectors. A vertex <em>x</em> in a graph <em>G</em> uniquely recognizes (resolves or identifies) two edges <em>e</em> and <em>f</em> in <em>G</em> if <span><math><msub><mrow><mi>d</mi></mrow><mrow><mi>G</mi></mrow></msub><mo>[</mo><mi>e</mi><mo>,</mo><mi>x</mi><mo>]</mo><mo>≠</mo><msub><mrow><mi>d</mi></mrow><mrow><mi>G</mi></mrow></msub><mo>[</mo><mi>f</mi><mo>,</mo><mi>x</mi><mo>]</mo></math></span>, where <span><math><msub><mrow><mi>d</mi></mrow><mrow><mi>G</mi></mrow></msub><mo>[</mo><mi>e</mi><mo>,</mo><mi>x</mi><mo>]</mo></math></span> stands for the distance between a vertex <em>x</em> and an edge <em>e</em> of <em>G</em>. A set <em>S</em> with the smallest number of vertices, such that every couple of edges is uniquely recognized by a minimum of one vertex in <em>S</em>, is an edge metric basis, and the edge metric dimension refers to the cardinality of such <em>S</em>. Fault tolerance of a working system is the ability of such a system to keep functioning even if one of its parts stops working properly. The fault tolerance property of the edge metric basis is considered in this work. This results in a concept called fault-tolerant edge metric basis. That is, an edge metric basis <em>S</em> of a graph <em>G</em> is fault-tolerant if every pair of edges of <em>G</em> are resolved by a minimum of two vertices in <em>S</em>, and the minimum possible cardinality of such sets is coined as the fault-tolerant edge metric dimension of <em>G</em>. In this work, we present bounds for the edge metric dimension of graphs and its fault tolerance version. In addition, we investigate these parameters for butterfly, Beneš and fractal cubic networks, and found the exact value for their (fault-tolerant) edge metric dimensions.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105141"},"PeriodicalIF":3.4,"publicationDate":"2025-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144489472","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe F. Italiano , Debasish Pattanayak , Gokarna Sharma
{"title":"Dispersion of mobile robots on directed anonymous graphs","authors":"Giuseppe F. Italiano , Debasish Pattanayak , Gokarna Sharma","doi":"10.1016/j.jpdc.2025.105139","DOIUrl":"10.1016/j.jpdc.2025.105139","url":null,"abstract":"<div><div>Given any arbitrary initial configuration of <span><math><mi>k</mi><mo>≤</mo><mi>n</mi></math></span> robots positioned on the nodes of an <em>n</em>-node anonymous graph, the problem of dispersion is to autonomously reposition the robots such that each node will contain at most one robot. This problem gained significant interest due to its resemblance with several fundamental problems such as exploration, scattering, load balancing, relocation of electric cars to charging stations, etc. The objective is to solve dispersion simultaneously minimizing (or providing trade-off between) time and memory requirement at each robot. The literature mainly dealt with dispersion on undirected anonymous graphs. In this paper, we initiate the study of dispersion on directed anonymous graphs. We first show that it may not always be possible to solve dispersion when the directed graph is not strongly connected. We then establish some lower bounds on both time and memory requirement at each robot for solving dispersion on a strongly connected directed graph. Finally, we provide three deterministic algorithms solving dispersion on any strongly connected directed graph. Let <em>D</em> be the graph diameter, <span><math><msub><mrow><mi>Δ</mi></mrow><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub></math></span> be its maximum out-degree, and <em>d</em> be the deficiency (the minimum number of edges needed to add to the graph to make it Eulerian). The first algorithm solves dispersion in <span><math><mi>O</mi><mo>(</mo><mi>d</mi><mo>⋅</mo><msup><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>)</mo></math></span> time with <span><math><mi>O</mi><mo>(</mo><mi>k</mi><mo>⋅</mo><mi>log</mi><mo></mo><mo>(</mo><mi>k</mi><mo>+</mo><msub><mrow><mi>Δ</mi></mrow><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>)</mo><mo>)</mo></math></span> bits at each robot. The second algorithm solves dispersion in <span><math><mi>O</mi><mo>(</mo><msup><mrow><mi>k</mi></mrow><mrow><mn>2</mn></mrow></msup><mo>⋅</mo><msub><mrow><mi>Δ</mi></mrow><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>)</mo></math></span> time with <span><math><mi>O</mi><mo>(</mo><mi>log</mi><mo></mo><mo>(</mo><mi>k</mi><mo>+</mo><msub><mrow><mi>Δ</mi></mrow><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>)</mo><mo>)</mo></math></span> bits at each robot. The third algorithm solves dispersion in <span><math><mi>O</mi><mo>(</mo><mi>k</mi><mo>⋅</mo><mi>D</mi><mo>)</mo></math></span> time with <span><math><mi>O</mi><mo>(</mo><mi>k</mi><mo>⋅</mo><mi>log</mi><mo></mo><mo>(</mo><mi>k</mi><mo>+</mo><msub><mrow><mi>Δ</mi></mrow><mrow><mi>o</mi><mi>u</mi><mi>t</mi></mrow></msub><mo>)</mo><mo>)</mo></math></span> bits at each robot, provided that robots in the 1-hop neighborhood can communicate. All three algorithms extend to handle crash faults.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105139"},"PeriodicalIF":3.4,"publicationDate":"2025-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144489471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The CAMINOS interconnection networks simulator","authors":"Cristóbal Camarero, Daniel Postigo, Pablo Fuentes","doi":"10.1016/j.jpdc.2025.105136","DOIUrl":"10.1016/j.jpdc.2025.105136","url":null,"abstract":"<div><div>This work presents CAMINOS, a new interconnection network simulator focusing on router microarchitecture. It was developed in Rust, a novel programming language with a syntax similar to C/C++ and strong memory protection.</div><div>The architecture of CAMINOS emphasizes the composition of components. This allows new designs to be defined in a configuration file without modifying source code, greatly reducing effort and time.</div><div>In addition to simulation functionality, CAMINOS assists in managing a collection of simulations as an experiment. This includes integration with SLURM to support executing batches of simulations and generating PDFs with results and diagnostics.</div><div>We show that CAMINOS makes good use of computing resources. Its memory usage is dominated by in-flight messages, showing low overhead in memory usage. We attest that CAMINOS can effectively use CPU time, as scenarios with little contention execute faster.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105136"},"PeriodicalIF":3.4,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144321378","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}