2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)最新文献

筛选
英文 中文
Instrumental Data Management and Scientific Workflow Execution: the CEA Case Study 仪器数据管理和科学工作流执行:CEA案例研究
F. Boito, J. Méhaut, T. Deutsch, B. Videau, F. Desprez
{"title":"Instrumental Data Management and Scientific Workflow Execution: the CEA Case Study","authors":"F. Boito, J. Méhaut, T. Deutsch, B. Videau, F. Desprez","doi":"10.1109/IPDPSW.2019.00139","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00139","url":null,"abstract":"In this paper, we study a typical scenario in research facilities. Instrumental data is generated by lab equipment such as microscopes, collected by researchers into USB devices, and analyzed in their own computers. In this scenario, an instrumental data management framework could store data in a institution-level storage infrastructure and allow to execute tasks to analyze this data in some available processing nodes. This setup has the advantages of promoting reproducible research and the efficient usage of the expensive lab equipment (in addition to increasing researchers productivity). We detail the requirements for such a framework regarding the needs of our case study of the CEA, review existing solutions and recommend the choice of Galaxy. We then analyze the performance limitations of the proposed architecture, and point to the connection between centralized storage and the processing nodes as the critical point. We also conduct a performance evaluation over an experimental platform to observe the limitations encountered in practice. We finish by pointing issues that are not addressed by existing solutions, and are therefore future work perspectives for the research field.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116415853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Evaluation of Circuits on the Reconfigurable Mesh 可重构网格上电路的评估
Y. Ben-Asher, Esti Stein
{"title":"Evaluation of Circuits on the Reconfigurable Mesh","authors":"Y. Ben-Asher, Esti Stein","doi":"10.1109/IPDPSW.2019.00020","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00020","url":null,"abstract":"The Reconfigurable Mesh (RM) is a grid of Processing Elements (PEs) that use dynamic reconfigurations to create varying bus-segments between its PEs. This allows the RM to perform computations such as sorting or counting in a constant number of steps. It has long been speculated that the RM's dynamic reconfiguration should replace the static reconfiguration architecture of the FPGA. In this work, we show that the RM can be used not only to accelerate specific computations such as sorting or summing but also for speeding up the main function of the FPGA, namely evaluation of Boolean Circuits (BCs). We propose an RM algorithm to evaluate BCs and show that it can be done without size blow-up. Moreover, like in the FPGA, it can be done using a grid of tri-state switching elements, rather than a grid of PEs as is the case with the regular RM. This model is called FPRM, and preliminary ASIC synthesis results illustrate that the FPRM architecture is about 2X faster and also more efficient in power/area than the FPGA routing infrastructure.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"34 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"123431871","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FIFO-Based Hardware Sorters for High Bandwidth Memory 基于fifo的高带宽内存硬件分选器
K. Nakano, Yasuaki Ito, J. Bordim
{"title":"FIFO-Based Hardware Sorters for High Bandwidth Memory","authors":"K. Nakano, Yasuaki Ito, J. Bordim","doi":"10.1109/IPDPSW.2019.00112","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00112","url":null,"abstract":"The main contribution of this paper is to show efficient FIFO-based hardware sorters that sort n elements with w bits each stored in a high bandwidth memory with modest access latency. We assume that each address of the high bandwidth memory can store p elements of w bits each, which can be read or written at the same time. The access latency l of the high bandwidth memory is assumed to take l clock cycles to access p elements in a specified address. Furthermore, burst mode is supported and k (≥ 1) consecutive addresses can be accessed in k+l-1 clock cycles in a pipeline fashion. However, if k addresses are not consecutive, kl clock cycles are necessary to access all of them. Clearly, all n elements arranged n/p addresses can be duplicated in 2(n/p+l-1) clock cycles. We present two types of hardware sorters that sort n=rc elements stored in an r×c matrix of the high bandwidth memory. We first develop Three-Pass-Sort and Four-Pass-Sort that sort an r×c matrix by reading from and witting in it three times and four times, respectively. We implement these two algorithms using FIFO-based mergers that can be configured as pairwise mode and sliding mode. Our hardware sorter based on Three-Pass-Sort runs in 6n/p+3c^2/p^2l+O(c/p(l+log r)+r) clock cycles using a circuit of size O(rwp) provided that r≥c^2. Also, our hardware sorter based on Four-Pass-Sort runs in 8n/p+2c^2l+O(cl+log r+p) clock cycles using a circuit of size O(rw).","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"71 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"127152603","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Reinforcement Learning Scheduling Strategy for Parallel Cloud-Based Workflows 基于云的并行工作流的强化学习调度策略
André Nascimento, Victor Olimpio, V. Silva, A. Paes, Daniel de Oliveira
{"title":"A Reinforcement Learning Scheduling Strategy for Parallel Cloud-Based Workflows","authors":"André Nascimento, Victor Olimpio, V. Silva, A. Paes, Daniel de Oliveira","doi":"10.1109/IPDPSW.2019.00134","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00134","url":null,"abstract":"Scientific experiments can be modeled as Workflows. Such Workflows are usually computing-and data-intensive, demanding the use of High-Performance Computing environments such as clusters, grids, and clouds. This latter offers the advantage of elasticity, which allows for increasing and/or decreasing the number of Virtual Machines (VMs) on demand. Workflows are typically managed using Scientific Workflow Management Systems (SWfMS). Many existing SWfMSs offer support for cloud-based execution. Each SWfMS has its own scheduler that follows a well-defined cost function. However, such cost functions must consider the characteristics of a dynamic environment, such as live migrations and/or performance fluctuations, which are far from trivial to model. This paper proposes a novel scheduling strategy, named ReASSIgN, based on Reinforcement Learning (RL). By relying on an RL technique, one may assume that there is an optimal (or sub-optimal) solution for the scheduling problem, and aims at learning the best scheduling based on previous executions in the absence of a mathematical model of the environment. For this, an extension of a well-known workflow simulator WorkflowSim is proposed to implement an RL strategy for scheduling workflows. Once the scheduling plan is generated, the workflow is executed in the cloud using SciCumulus SWfMS. We conducted a thorough evaluation of the proposed scheduling strategy using a real astronomy workflow.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"129919068","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 9
Approximate and Exact Selection on GPUs gpu的近似和精确选择
T. Ribizel, H. Anzt
{"title":"Approximate and Exact Selection on GPUs","authors":"T. Ribizel, H. Anzt","doi":"10.1109/IPDPSW.2019.00088","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00088","url":null,"abstract":"We present a novel algorithm for parallel selection on GPUs. The algorithm requires no assumptions on the input data distribution, and has a much lower recursion depth compared to many state-of-the-art algorithms. We implement the algorithm for different GPU generations, always using the respectively-available low-level communication features, and assess the performance on server-line hardware. The computational complexity of our SampleSelect algorithm is comparable to specialized algorithms designed for - and exploiting the characteristics of - \"pleasant\" data distributions. At the same time, as the SampleSelect does not work on the actual values but the ranks of the elements only, it is robust to the input data and can complete significantly faster for adversarial data distributions. Additionally to the exact SampleSelect, we address the use case of approximate selection by designing a variant that radically reduces the computational cost while preserving high approximation accuracy.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"114168780","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Teaching High Performance Computing through Parallel Programming Marathons 通过并行编程马拉松来教授高性能计算
L. A. J. Marzulo, Calebe P. Bianchini, Leandro Santiago, V. C. Ferreira, Brunno F. Goldstein, F. França
{"title":"Teaching High Performance Computing through Parallel Programming Marathons","authors":"L. A. J. Marzulo, Calebe P. Bianchini, Leandro Santiago, V. C. Ferreira, Brunno F. Goldstein, F. França","doi":"10.1109/IPDPSW.2019.00058","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00058","url":null,"abstract":"Parallel and distributed programming is essential for exploiting the processing power of modern computing platforms. However, during the first years of a Computer Science course, students usually learn problem solving techniques, data structures and programming paradigms that are inherently sequential, hindering the transition to parallel architectures. Parallel Programming Marathons organized in Brazil are similar to other Programming Competitions around the world and have been used for teaching and stimulating undergraduate and graduate students into learning to \"think in parallel\" and to develop applications for different parallel architectures, including multicores, clusters and accelerators. This paper presents the structure of this Parallel Programming Marathon and an overview of how it supports regional and national contests. Also, this work presents use cases on Parallel and Distributed Computing course from two different Brazilian universities that use a challenge based learning approach and employ marathon problems as course assignments. This approach contributed to increase students' interest towards High Performance Computing.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"116698228","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
BRICS – Efficient Techniques for Estimating the Farness-Centrality in Parallel 金砖国家——平行估算法中心性的有效技术
Sai Charan Regunta, Sai Harsh Tondomker, Kishore Kothapalli
{"title":"BRICS – Efficient Techniques for Estimating the Farness-Centrality in Parallel","authors":"Sai Charan Regunta, Sai Harsh Tondomker, Kishore Kothapalli","doi":"10.1109/IPDPSW.2019.00110","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00110","url":null,"abstract":"In this paper, we study scalable parallel algorithms for estimating the farness-centrality value of the nodes in a given undirected and connected graph. Our algorithms consider approaches that are more suitable for sparse graphs. To this end, we propose four optimization techniques based on removing redundant nodes, removing identical nodes, removing chain nodes, and making use of decomposition based on the biconnected components of the input graph. We test our techniques on a collection of real-world graphs for the time taken and the average error percentage. We further analyze the applicability of our techniques on various classes of real-world graphs. We suggest why certain techniques work better on certain classes of graphs.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"126193857","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Are we Doing the Right Thing? — A Critical Analysis of the Academic HPC Community 我们在做正确的事吗?-对学术高性能计算社区的批判性分析
H. Anzt, Goran Flegar
{"title":"Are we Doing the Right Thing? — A Critical Analysis of the Academic HPC Community","authors":"H. Anzt, Goran Flegar","doi":"10.1109/IPDPSW.2019.00122","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00122","url":null,"abstract":"Like in any other research field, academically surviving in the High Performance Computing (HPC) community generally requires to publish papers, in the bast case many of them and in high-ranked journals or at top-tier conferences. As a result, the number of scientific papers published each year in this relatively small community easily outnumbers what a single researcher can read. At the same time, many of the proposed and analyzed strategies, algorithms, and hardware-optimized implementations never make it beyond the prototype stage, as they are abandoned once they served the single purpose of yielding (another) publication. In a time and field where high-quality manpower is a scarce resource, this is extremely inefficient. In this position paper we promote a radical paradigm shift towards accepting high-quality software patches to community software packages as legitimate conference contributions. In consequence, the reputation and appointability of researchers is no longer based on the classical scientific metrics, but on the quality and documentation of open source software contributions — effectively improving and accelerating the collaborative development of community software.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130517162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Heterogeneous Active Messages for Offloading on the NEC SX-Aurora TSUBASA 在NEC SX-Aurora TSUBASA上卸载异构活动消息
M. Noack, E. Focht, T. Steinke
{"title":"Heterogeneous Active Messages for Offloading on the NEC SX-Aurora TSUBASA","authors":"M. Noack, E. Focht, T. Steinke","doi":"10.1109/IPDPSW.2019.00014","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00014","url":null,"abstract":"The NEC SX-Aurora TSUBASA is a new generation of vector processing architectures that combines a standard Intel Xeon host with the newly developed NEC Vector Engine coprocessor cards. One way to use these coprocessors is offloading suitable parts of the program from the host to the Vector Engines. Currently, the only vendor-provided offloading solutions are the low-level Vector Engine Offloading (VEO) library, and a builtin reverse-offloading mechanism named VHcall. In this work, we extend the portable Heterogeneous Active Messages (HAM) based HAM-Offload framework with support for the NEC SX-Aurora TSUBASA. Therefore, we design, implement, and evaluate two messaging protocols aimed at minimising offloading cost. This sheds some light on how to achieve fast communication between host CPU and the Vector Engines of the NEC SX-Aurora TSUBASA. Compared with VEO, the DMA-based protocol reduces offloading overhead by a factor of 13×. The resulting framework enables users to write portable offload applications with low overhead, that do neither require a language extension like OpenMP, nor a special language like OpenCL. Existing HAM-Offload applications are now ready to run on the NEC SX-Aurora TSUBASA.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"130762110","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
A Fast Local Algorithm for Track Reconstruction on Parallel Architectures 并行结构下航迹重建的快速局部算法
D. C. Pérez, N. Neufeld, A. Riscos-Núñez
{"title":"A Fast Local Algorithm for Track Reconstruction on Parallel Architectures","authors":"D. C. Pérez, N. Neufeld, A. Riscos-Núñez","doi":"10.1109/IPDPSW.2019.00118","DOIUrl":"https://doi.org/10.1109/IPDPSW.2019.00118","url":null,"abstract":"The reconstruction of particle trajectories, tracking, is a central process in the reconstruction of particle collisions in High Energy Physics detectors. At the LHCb detector in the Large Hadron Collider, bunches of particles collide 30 million times per second. These collisions produce about 10^9 particle trajectories per second that need to be reconstructed in real time, in order to filter and store data. Upcoming improvements in the LHCb detector will deprecate the hardware filter in favour of a full software filter, posing a computing challenge that requires a renovation of current algorithms and the underlying hardware. We present Search by triplet, a local tracking algorithm optimized for parallel architectures. We design our algorithm reducing Read-After-Write dependencies as well as conditional branches, incrementing the potential for parallelization. We analyze the complexity of our algorithm and validate our results. We show the scaling of our algorithm for an increasing number of collision events. We show sustained tests for our algorithm sequence given a simulated dataflow. We develop CPU and GPU implementations of our work, and hide the transmission times between device and host by executing a multi-stream pipeline. Our results provide a reliable basis for an informed assessment on the feasibility of LHCb event reconstruction on parallel architectures, enabling us to develop cost models for upcoming technology upgrades. The created software infrastructure is extensible and permits the addition of subsequent reconstruction algorithms.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"128371860","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信