Journal of Parallel and Distributed Computing最新文献

筛选
英文 中文
Optimizing parallel heterogeneous system efficiency: Dynamic task graph adaptation with recursive tasks 优化并行异构系统效率:递归任务的动态任务图自适应
IF 4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-28 DOI: 10.1016/j.jpdc.2025.105157
Nathalie Furmento, Abdou Guermouche, Gwenolé Lucas, Thomas Morin, Samuel Thibault, Pierre-André Wacrenier
{"title":"Optimizing parallel heterogeneous system efficiency: Dynamic task graph adaptation with recursive tasks","authors":"Nathalie Furmento,&nbsp;Abdou Guermouche,&nbsp;Gwenolé Lucas,&nbsp;Thomas Morin,&nbsp;Samuel Thibault,&nbsp;Pierre-André Wacrenier","doi":"10.1016/j.jpdc.2025.105157","DOIUrl":"10.1016/j.jpdc.2025.105157","url":null,"abstract":"<div><div>Task-based programming models are currently an ample trend to leverage heterogeneous parallel systems in a productive way (OpenACC, Kokkos, Legion, OmpSs, <span>PaRSEC</span>, <span>StarPU</span>, XKaapi, ...). Among these models, the Sequential Task Flow (STF) model is widely embraced (<span>PaRSEC</span>'s DTD, OmpSs, <span>StarPU</span>) since it allows to express task graphs naturally through a sequential-looking submission of tasks, and tasks dependencies are inferred automatically. However, STF is limited to task graphs with task sizes that are fixed at submission, posing a challenge in determining the optimal task granularity. Notably, in heterogeneous systems, the optimal task size varies across different processing units, so a single task size would not fit all units. <span>StarPU</span>'s recursive tasks allow graphs with several task granularities by turning some tasks into sub-graphs dynamically at runtime. The decision to transform these tasks into sub-graphs is decided by a <span>StarPU</span> component called the Splitter. After deciding to transform some tasks, classical scheduling approaches are used, making this component generic, and orthogonal to the scheduler. In this paper, we propose a new policy for the Splitter, which is designed for heterogeneous platforms, that relies on linear programming aimed at minimizing execution time and maximizing resource utilization. This results in a dynamic well-balanced set comprising both small tasks to fill multiple CPU cores, and large tasks for efficient execution on accelerators like GPU devices. We then present an experimental evaluation showing that just-in-time adaptations of the task graph lead to improved performance across various dense linear algebra algorithms.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105157"},"PeriodicalIF":4.0,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144749390","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
To repair or not to repair: Assessing fault resilience in MPI stencil applications 修复或不修复:评估MPI模板应用中的故障恢复能力
IF 4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-25 DOI: 10.1016/j.jpdc.2025.105156
Roberto Rocco , Elisabetta Boella , Daniele Gregori , Gianluca Palermo
{"title":"To repair or not to repair: Assessing fault resilience in MPI stencil applications","authors":"Roberto Rocco ,&nbsp;Elisabetta Boella ,&nbsp;Daniele Gregori ,&nbsp;Gianluca Palermo","doi":"10.1016/j.jpdc.2025.105156","DOIUrl":"10.1016/j.jpdc.2025.105156","url":null,"abstract":"<div><div>With the increasing size of HPC computations, faults are becoming more and more relevant in the HPC field. The MPI standard does not define the application behaviour after a fault, leaving the burden of fault management to the user, who usually resorts to checkpoint and restart mechanisms. This trend is especially true in stencil applications, as their regular pattern simplifies the selection of checkpoint locations. However, checkpoint and restart mechanisms introduce non-negligible overhead, disk load, and scalability concerns. In this paper, we show an alternative through fault resilience, enabled by the features provided by the User Level Fault Mitigation extension and shipped within the Legio fault resilience framework. Through fault resilience, we continue executing only the non-failed processes, thus sacrificing result accuracy for faster fault recovery. Our experiments on some specimen stencil applications show that, despite the fault impact visible in the result, we produced meaningful values usable for scientific research, proving the possibilities of a fault resilience approach in a stencil scenario.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105156"},"PeriodicalIF":4.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144720967","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Federated multi-task learning with cross-device heterogeneous task subsets 跨设备异构任务子集的联邦多任务学习
IF 4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-25 DOI: 10.1016/j.jpdc.2025.105155
Zewei Xin, Qinya Li, Chaoyue Niu, Fan Wu, Guihai Chen
{"title":"Federated multi-task learning with cross-device heterogeneous task subsets","authors":"Zewei Xin,&nbsp;Qinya Li,&nbsp;Chaoyue Niu,&nbsp;Fan Wu,&nbsp;Guihai Chen","doi":"10.1016/j.jpdc.2025.105155","DOIUrl":"10.1016/j.jpdc.2025.105155","url":null,"abstract":"<div><div>Traditional Federated Learning (FL) predominantly focuses on task-consistent scenarios, assuming clients possess identical tasks or task sets. However, in multi-task scenarios, client task sets can vary greatly due to their operating environments, available resources, and hardware configurations. Conventional task-consistent FL cannot address such heterogeneity effectively. We define this statistical heterogeneity of task sets, where each client performs a unique subset of server tasks, as cross-device task heterogeneity. In this work, we propose a novel Federated Partial Multi-task (FedPMT) method, allowing clients with diverse task sets to collaborate and train comprehensive models suitable for any task subset. Specifically, clients deploy partial multi-task models tailored to their localized task sets, while the server utilizes single-task models as an intermediate stage to address the model heterogeneity arising from differing task sets. Collaborative training is facilitated through bidirectional transformations between them. To alleviate the negative transfer caused by task set disparities, we introduce task attenuation factors to modulate the influence of different tasks. This adjustment enhances the performance and task generalization ability of cloud models, promoting models to converge towards a shared optimum across all task subsets. Extensive experiments conducted on the NYUD-v2, PASCAL Context and Cityscapes datasets validate the effectiveness and superiority of FedPMT.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105155"},"PeriodicalIF":4.0,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144720966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues) 封面1 -完整的扉页(每期)/特刊扉页(每期)
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-11 DOI: 10.1016/S0743-7315(25)00116-9
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(25)00116-9","DOIUrl":"10.1016/S0743-7315(25)00116-9","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"204 ","pages":"Article 105149"},"PeriodicalIF":3.4,"publicationDate":"2025-07-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144604858","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
DH_Aligner: A fast short-read aligner on multicore platforms with AVX vectorization DH_Aligner:具有AVX矢量化的多核平台上的快速短读对齐器
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-04 DOI: 10.1016/j.jpdc.2025.105142
Qiao Sun , Feng Chen , Leisheng Li , Huiyuan Li
{"title":"DH_Aligner: A fast short-read aligner on multicore platforms with AVX vectorization","authors":"Qiao Sun ,&nbsp;Feng Chen ,&nbsp;Leisheng Li ,&nbsp;Huiyuan Li","doi":"10.1016/j.jpdc.2025.105142","DOIUrl":"10.1016/j.jpdc.2025.105142","url":null,"abstract":"<div><div>The rapid development of the NGS (Next-Generation Sequencing) technology leads to massive genome data produced at a much higher throughput than before, which leads to great demand for downstream fast and accurate genetic analysis. As one of the first steps of bio-informatical work-flow, read alignment makes an educated guess on where and how a read is mapped to a given reference sequence. In this paper, we propose DH_Aligner, a fast and accurate short read aligner designed and optimized for x86 multi-core platforms with <span>avx2/avx512</span> SIMD instruction sets. It is based on a three-phased aligning work-flow: seeding-filtering-extension and provides an end-to-end solution for read alignment from <span>Fastq</span> to <span>SAM</span> files. Due to a fast seeding scheme and a seed filtering procedure, DH_Aligner can avoid both of a time-consuming seeding phase and redundant workload of aligning reads at seemingly wrong locations. With the introduction of batched-processing methodology, parallelism is easily exploited at data-, instruction- and thread-level. The performance-critical kernels in DH_Aligner are implemented by both <span>avx2</span> and <span>avx512</span> intrinsics for a better performance and portability. On two typical x86 based platforms: Intel Xeon-6154 and Hygon C86-7285, DH_Aligner can produce a near-best accuracy/sensitivity while outperform state-of-the-art parallel implementations with average speedup: 7.8x, 3.4x, 2.8x-6.7x and 1.5x over bwa-mem, bwa-mem2, bowtie2 and minimap2 respectively.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105142"},"PeriodicalIF":3.4,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Integration framework for online thread throttling with thread and page mapping on NUMA systems 基于NUMA系统的线程和页面映射的在线线程节流集成框架
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-04 DOI: 10.1016/j.jpdc.2025.105145
Janaina Schwarzrock , Hiago Mayk G. de A. Rocha , Arthur F. Lorenzon , Samuel Xavier de Souza , Antonio Carlos S. Beck
{"title":"Integration framework for online thread throttling with thread and page mapping on NUMA systems","authors":"Janaina Schwarzrock ,&nbsp;Hiago Mayk G. de A. Rocha ,&nbsp;Arthur F. Lorenzon ,&nbsp;Samuel Xavier de Souza ,&nbsp;Antonio Carlos S. Beck","doi":"10.1016/j.jpdc.2025.105145","DOIUrl":"10.1016/j.jpdc.2025.105145","url":null,"abstract":"<div><div>Non-Uniform Memory Access (NUMA) systems are prevalent in HPC, where optimal thread-to-core allocation and page placement are crucial for enhancing performance and minimizing energy usage. Moreover, considering that NUMA systems have hardware support for a large number of hardware threads and many parallel applications have limited scalability, artificially decreasing the number of threads by using Dynamic Concurrency Throttling (DCT) may bring further improvements. However, the optimal configuration (thread mapping, page mapping, number of threads) for energy and performance, quantified by the Energy-Delay Product (EDP), varies with the system hardware, application and input set, even during execution. Because of this dynamic nature, adaptability is essential, making offline strategies much less effective. Despite their effectiveness, online strategies introduce additional execution overhead, which involves learning at run-time and the cost of transitions between configurations with cache warm-ups, thread and data reallocation. Thus, balancing the learning time and solution quality becomes increasingly significant. In this scenario, this work proposes a framework to find such optimal configurations into a single, online, and efficient approach. Our experimental evaluation shows that our framework improves EDP and performance compared to online state-of-the-art techniques of thread/page mapping (up to 69.3% and 43.4%) and DCT (up to 93.2% and 74.9%), while being totally adaptive and requiring minimum user intervention.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105145"},"PeriodicalIF":3.4,"publicationDate":"2025-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571438","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Complexity analysis and scalability of a matrix-free extrapolated geometric multigrid solver for curvilinear coordinates representations from fusion plasma applications 融合等离子体曲线坐标表示的无矩阵外推几何多网格求解器的复杂性分析和可扩展性
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-03 DOI: 10.1016/j.jpdc.2025.105143
Philippe Leleux , Christina Schwarz , Martin J. Kühn , Carola Kruse , Ulrich Rüde
{"title":"Complexity analysis and scalability of a matrix-free extrapolated geometric multigrid solver for curvilinear coordinates representations from fusion plasma applications","authors":"Philippe Leleux ,&nbsp;Christina Schwarz ,&nbsp;Martin J. Kühn ,&nbsp;Carola Kruse ,&nbsp;Ulrich Rüde","doi":"10.1016/j.jpdc.2025.105143","DOIUrl":"10.1016/j.jpdc.2025.105143","url":null,"abstract":"<div><div>Tokamak fusion reactors are promising alternatives for future energy production. Gyrokinetic simulations are important tools to understand physical processes inside tokamaks and to improve the design of future plants. In gyrokinetic codes such as Gysela, these simulations involve at each time step the solution of a gyrokinetic Poisson equation defined on disk-like cross sections. The authors of <span><span>[14]</span></span>, <span><span>[15]</span></span> proposed to discretize a simplified differential equation using symmetric finite differences derived from the resulting energy functional and to use an implicitly extrapolated geometric multigrid scheme tailored to problems in curvilinear coordinates. In this article, we extend the discretization to a more realistic partial differential equation and demonstrate the optimal linear complexity of the proposed solver, in terms of computation and memory. We provide a general framework to analyze floating point operations and memory usage of matrix-free approaches for stencil-based operators. Finally, we give an efficient matrix-free implementation for the considered solver exploiting a task-based multithreaded parallelism which takes advantage of the disk-shaped geometry of the problem. We demonstrate the parallel efficiency for the solution of problems of size up to 50 million unknowns.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105143"},"PeriodicalIF":3.4,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144571437","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Towards efficient program execution on edge-cloud computing platforms 在边缘云计算平台上实现高效程序执行
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-02 DOI: 10.1016/j.jpdc.2025.105135
Jean-François Dollinger, Vincent Vauchey
{"title":"Towards efficient program execution on edge-cloud computing platforms","authors":"Jean-François Dollinger,&nbsp;Vincent Vauchey","doi":"10.1016/j.jpdc.2025.105135","DOIUrl":"10.1016/j.jpdc.2025.105135","url":null,"abstract":"<div><div>This paper investigates techniques dedicated to the performance of edge-cloud infrastructures and identifies the challenges to address to maximize their efficiency. Unlike traditional cloud-only processing, edge-cloud platforms meet the stringent requirements of real-time applications via additional computing resources close to the data source. Yet, due to numerous performance factors, it is a complex task to perform efficient computations on such platforms. Thus, we identify the main performance bottlenecks induced by traditional approaches and extensively discuss the performance characteristics of edge computing platforms. Based on these insights, we design an automated framework capable of achieving end-to-end efficacy of edge-cloud applications. We argue that achieving performance on edge-cloud infrastructures requires adaptive offloading of programs based on computational requirements. Thus, we comprehensively study three performance-critical aspects forming the performance workflow of applications: i) performance modelling, ii) program optimization iii) task scheduling. First, we explore performance modelling techniques, forming the foundation of most cost models, to accurately predict and achieve robust code optimization and scheduling. We then cover the whole program optimization chain, from hotspot detection to code optimization, focusing on memory locality, code parallelization, and acceleration. Finally, we discuss task scheduling techniques for selecting the best computing resource and ensuring a balanced workload distribution. Overall, our study provides insights by covering the above performance workflow referencing prominent state-of-the-art works, particularly focusing on those not yet applied in the context of edge-cloud computing. Additionally, we conducted experiments to further validate our findings. Finally, for each topic of interest, we identify the addressed scientific obstacles and outline the open research challenges yet to be overcome.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105135"},"PeriodicalIF":3.4,"publicationDate":"2025-07-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144581112","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
MM-AutoSolver: A multimodal machine learning method for the auto-selection of iterative solvers and preconditioners MM-AutoSolver:一种多模态机器学习方法,用于自动选择迭代求解器和预处理器
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-07-01 DOI: 10.1016/j.jpdc.2025.105144
Hantao Xiong , Wangdong Yang , Weiqing He , Shengle Lin , Keqin Li , Kenli Li
{"title":"MM-AutoSolver: A multimodal machine learning method for the auto-selection of iterative solvers and preconditioners","authors":"Hantao Xiong ,&nbsp;Wangdong Yang ,&nbsp;Weiqing He ,&nbsp;Shengle Lin ,&nbsp;Keqin Li ,&nbsp;Kenli Li","doi":"10.1016/j.jpdc.2025.105144","DOIUrl":"10.1016/j.jpdc.2025.105144","url":null,"abstract":"<div><div>The solution of large-scale sparse linear systems of the form <span><math><mi>A</mi><mi>x</mi><mo>=</mo><mi>b</mi></math></span> is an important research problem in the field of High-performance Computing (HPC). With the increasing scale of these systems and the development of both HPC software and hardware, iterative solvers along with appropriate preconditioners have become mainstream methods for efficiently solving these sparse linear systems that arise from real-world HPC applications. Among abundant combinations of iterative solvers and preconditioners, the automatic selection of the optimal one has become a vital problem for accelerating the solution of these sparse linear systems. Previous work has utilized machine learning or deep learning algorithms to tackle this problem, but fails to abstract and exploit sufficient features from sparse linear systems, thus unable to obtain satisfactory results. In this work, we propose to address the automatic selection of the optimal combination of iterative solvers and preconditioners through the powerful multimodal machine learning framework, in which features of different modalities can be fully extracted and utilized to improve the results. Based on the multimodal machine learning framework, we put forward a multimodal machine learning model called MM-AutoSolver for the auto-selection of the optimal combination for a given sparse linear system. The experimental results based on a new large-scale matrix collection showcase that the proposed MM-AutoSolver outperforms state-of-the-art methods in predictive performance and has the capability to significantly accelerate the solution of large-scale sparse linear systems in HPC applications.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105144"},"PeriodicalIF":3.4,"publicationDate":"2025-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144536100","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Parallel watershed partitioning: GPU-based hierarchical image segmentation 并行分水岭分割:基于gpu的分层图像分割
IF 3.4 3区 计算机科学
Journal of Parallel and Distributed Computing Pub Date : 2025-06-27 DOI: 10.1016/j.jpdc.2025.105140
Varduhi Yeghiazaryan , Yeva Gabrielyan , Irina Voiculescu
{"title":"Parallel watershed partitioning: GPU-based hierarchical image segmentation","authors":"Varduhi Yeghiazaryan ,&nbsp;Yeva Gabrielyan ,&nbsp;Irina Voiculescu","doi":"10.1016/j.jpdc.2025.105140","DOIUrl":"10.1016/j.jpdc.2025.105140","url":null,"abstract":"<div><div>Many image processing applications rely on partitioning an image into disjoint regions whose pixels are ‘similar.’ The watershed and waterfall transforms are established mathematical morphology pixel clustering techniques. They are both relevant to modern applications where groups of pixels are to be decided upon in one go, or where adjacency information is relevant. We introduce three new parallel partitioning algorithms for GPUs. By repeatedly applying watershed algorithms, we produce waterfall results which form a hierarchy of partition regions over an input image. Our watershed algorithms attain competitive execution times in both 2D and 3D, processing an 800 megavoxel image in less than 1.4 sec. We also show how to use this fully deterministic image partitioning as a pre-processing step to machine-learning-based semantic segmentation. This replaces the role of superpixel algorithms, and results in comparable accuracy and faster training times. The code is publicly available at <span><span>https://github.com/hamemm/PRUF-watershed.git</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"205 ","pages":"Article 105140"},"PeriodicalIF":3.4,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144656655","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信