Parallel Computing最新文献

筛选
英文 中文
A parallel non-convex approximation framework for risk parity portfolio design 风险平价投资组合设计的并行非凸近似框架
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.102999
Yidong Chen , Chen Li , Yonghong Hu , Zhonghua Lu
{"title":"A parallel non-convex approximation framework for risk parity portfolio design","authors":"Yidong Chen ,&nbsp;Chen Li ,&nbsp;Yonghong Hu ,&nbsp;Zhonghua Lu","doi":"10.1016/j.parco.2023.102999","DOIUrl":"https://doi.org/10.1016/j.parco.2023.102999","url":null,"abstract":"<div><p>In this paper, we propose a parallel non-convex approximation framework (NCAQ) for optimization problems whose objective is to minimize a convex function plus the sum of non-convex functions. Based on the structure of the objective function, our framework transforms the non-convex constraints to the logarithmic barrier function and approximates the non-convex problem by a parallel quadratic approximation scheme, which will allow the original problem to be solved by accelerated inexact gradient descent in the parallel environment. Moreover, we give a detailed convergence analysis for the proposed framework. The numerical experiments show that our framework outperforms the state-of-art approaches in terms of accuracy and computation time on the high dimension non-convex Rosenbrock test functions and the risk parity problems. In particular, we implement the proposed framework on CUDA, showing a more than 25 times speed-up ratio and removing the computational bottleneck for non-convex risk-parity portfolio design. Finally, we construct the high dimension risk parity portfolio which can consistently outperform the equal weight portfolio in the application of Chinese stock markets.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"116 ","pages":"Article 102999"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49756831","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An optimal scheduling algorithm considering the transactions worst-case delay for multi-channel hyperledger fabric network 多通道超级账本网络中考虑事务最坏延迟的最优调度算法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103041
Ou Wu, Shanshan Li, He Zhang, Liwen Liu, Haoming Li, Yanze Wang, Ziyi Zhang
{"title":"An optimal scheduling algorithm considering the transactions worst-case delay for multi-channel hyperledger fabric network","authors":"Ou Wu, Shanshan Li, He Zhang, Liwen Liu, Haoming Li, Yanze Wang, Ziyi Zhang","doi":"10.1016/j.parco.2023.103041","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103041","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 1","pages":"103041"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"55107811","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A survey of software techniques to emulate heterogeneous memory systems in high-performance computing 在高性能计算中模拟异构存储系统的软件技术综述
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103023
Clément Foyer, Brice Goglin, Andrès Rubio Proaño
{"title":"A survey of software techniques to emulate heterogeneous memory systems in high-performance computing","authors":"Clément Foyer,&nbsp;Brice Goglin,&nbsp;Andrès Rubio Proaño","doi":"10.1016/j.parco.2023.103023","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103023","url":null,"abstract":"<div><p><span>Heterogeneous memory will be involved in several upcoming platforms on the way to exascale. Combining technologies such as HBM, DRAM and/or </span>NVDIMM<span> allows to tackle the needs of different applications in terms of bandwidth, latency or capacity. And new memory interconnects such as CXL bring easy ways to attach these technologies to the processors.</span></p><p>High-performance computing developers must prepare their runtimes and applications for these architectures, even before they are actually available. Hence, we survey software solutions for emulating them. First, we list many ways to modify the performance of platforms so that developers may test their code under different memory performance profiles. This is required to identify kernels and data buffers that are sensitive to memory performance.</p><p>Then, we present several techniques for exposing fake heterogeneous memory information to the software stack. This is useful for adapting runtimes and applications to heterogeneous memory so that different kinds of memory are detected at runtime and so that buffers are allocated in the appropriate one.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"116 ","pages":"Article 103023"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49756349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
A lightweight semi-centralized strategy for the massive parallelization of branching algorithms 分支算法大规模并行化的轻量级半集中式策略
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103024
Andres Pastrana-Cruz, Manuel Lafond
{"title":"A lightweight semi-centralized strategy for the massive parallelization of branching algorithms","authors":"Andres Pastrana-Cruz,&nbsp;Manuel Lafond","doi":"10.1016/j.parco.2023.103024","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103024","url":null,"abstract":"<div><p>Several NP-hard problems are solved exactly using exponential-time branching strategies, whether it be branch-and-bound algorithms, or bounded search trees in fixed-parameter algorithms. The number of tractable instances that can be handled by sequential algorithms is usually small, whereas massive parallelization has been shown to significantly increase the space of instances that can be solved exactly. However, previous centralized approaches require too much communication to be efficient, whereas decentralized approaches are more efficient but have difficulty keeping track of the global state of the exploration.</p><p>In this work, we propose to revisit the centralized paradigm while avoiding previous bottlenecks. In our strategy, the center has lightweight responsibilities, requires only a few bits for every communication, but is still able to keep track of the progress of every worker. In particular, the center never holds any task but is able to guarantee that a process with no work always receives the highest priority task globally.</p><p>Our strategy was implemented in a generic C++ library called GemPBA, which allows a programmer to convert a sequential branching algorithm into a parallel version by changing only a few lines of code. An experimental case study on the vertex cover problem demonstrates that some of the toughest instances from the DIMACS challenge graphs that would take months to solve sequentially can be handled within two hours with our approach.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"116 ","pages":"Article 103024"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49756350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters 集群中异步多任务运行时的基于生命线的负载平衡方案
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103020
Lukas Reitz, Kai Hardenbicker, Tobias Werner, Claudia Fohry
{"title":"Lifeline-based load balancing schemes for Asynchronous Many-Task runtimes in clusters","authors":"Lukas Reitz,&nbsp;Kai Hardenbicker,&nbsp;Tobias Werner,&nbsp;Claudia Fohry","doi":"10.1016/j.parco.2023.103020","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103020","url":null,"abstract":"<div><p><span>A popular approach to program scalable irregular applications is Asynchronous Many-Task (AMT) Programming. Here, programs define tasks according to task models such as dynamic independent tasks (DIT) or nested fork-join (NFJ). We consider cluster AMTs, in which a runtime system maps the tasks to worker </span>threads in multiple processes.</p><p>Thereby, dynamic load balancing can be achieved via cooperative work stealing, coordinated work stealing, or work sharing. A well-performing cooperative work stealing variant is the lifeline scheme. While previous implementations of this scheme are restricted to single-worker processes, a recent hybrid extension combines it with intra-process work sharing between multiple workers. The hybrid scheme, which was proposed for both DIT and NFJ, comes at the price of a higher complexity.</p><p>This paper investigates whether this complexity is indispensable for multi-worker processes by contrasting the hybrid scheme with a novel pure work stealing extension of the lifeline scheme to multiple workers. We independently implemented the extension for DIT and NFJ. In experiments based on four benchmarks, we observed the pure scheme to be on a par or even outperform the hybrid one by up to 18% for DIT and up to 5% for NFJ.</p><p>Building on this main result, we studied a modification of the pure scheme, which prefers local over global victims, and more heavily loaded over less loaded ones. The modification improves the performance of the pure scheme by up to 15%. Finally, we explored whether the lifeline scheme can profit from a change to coordinated work stealing. We developed a coordinated multi-worker implementation for DIT and observed a performance improvement over the cooperative scheme by up to 17%.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"116 ","pages":"Article 103020"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49728376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation 一种加速量子化学模拟的内存异构处理方法
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103017
Zeshi Liu , Zhen Xie , Wenqian Dong , Mengting Yuan , Haihang You , Dong Li
{"title":"A heterogeneous processing-in-memory approach to accelerate quantum chemistry simulation","authors":"Zeshi Liu ,&nbsp;Zhen Xie ,&nbsp;Wenqian Dong ,&nbsp;Mengting Yuan ,&nbsp;Haihang You ,&nbsp;Dong Li","doi":"10.1016/j.parco.2023.103017","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103017","url":null,"abstract":"<div><p><span><span>The “memory wall” is an architectural property introducing high memory access latency that can manifest application performance, and this wall becomes even taller in the context of big data. Although the use of GPU-based systems could achieve high performance, it is difficult to improve the utilization of </span>GPU<span> systems due to the “memory wall”. The intensive data exchange and computation remains a challenge when confronting applications with a massive memory footprint<span>. Quantum-mechanics-based ab initio calculations, which leverage high-performance computing to investigate multi-electron systems, have been widely used in computational chemistry. However, ab initio calculations are labor-intensive and can easily consume more than hundreds of gigabytes of memory. Previous efforts on heterogeneous accelerators via GPU and CPU suffer from high-latency off-device memory access. In this paper, we introduce heterogeneous processing-in-memory (PIM) to mitigate the overhead of data movement between CPUs and GPUs, and deeply analyze two of the most memory-intensive parts of the quantum chemistry, for example, the FFT<span> and time-consuming loops. Specifically, we exploit runtime systems and programming models to improve hardware utilization and simplify programming efforts by moving computation close to the data and eliminating hardware idling. We take a widely used software, the QUANTUM ESPRESSO (opEn-Source Package for Research in Electronic Structure, Simulation, and Optimization), to perform our experiments, and our results show that our design provides up to </span></span></span></span><span><math><mrow><mn>4</mn><mo>.</mo><mn>09</mn><mo>×</mo></mrow></math></span> and <span><math><mrow><mn>2</mn><mo>.</mo><mn>60</mn><mo>×</mo></mrow></math></span> of performance improvement and 71% and 88% energy reduction over CPU and GPU (NVIDIA P100), respectively.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"116 ","pages":"Article 103017"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49728436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
New YARN sharing GPU based on graphics memory granularity scheduling 基于图形内存粒度调度的新型YARN共享GPU
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-07-01 DOI: 10.1016/j.parco.2023.103038
Jinliang Shi, Dewu Chen, Jiabi Liang, Lin Li, Yue-ying Lin, Jianjiang Li
{"title":"New YARN sharing GPU based on graphics memory granularity scheduling","authors":"Jinliang Shi, Dewu Chen, Jiabi Liang, Lin Li, Yue-ying Lin, Jianjiang Li","doi":"10.1016/j.parco.2023.103038","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103038","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 1","pages":"103038"},"PeriodicalIF":1.4,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"55107745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Big data BPMN workflow resource optimization in the cloud 云中的大数据BPMN工作流资源优化
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-06-01 DOI: 10.1016/j.parco.2023.103025
S. Simić, Nikola Tanković, D. Etinger
{"title":"Big data BPMN workflow resource optimization in the cloud","authors":"S. Simić, Nikola Tanković, D. Etinger","doi":"10.1016/j.parco.2023.103025","DOIUrl":"https://doi.org/10.1016/j.parco.2023.103025","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"117 1","pages":"103025"},"PeriodicalIF":1.4,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"55107045","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
GPU acceleration of Levenshtein distance computation between long strings 长字符串间Levenshtein距离计算的GPU加速
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-04-01 DOI: 10.2139/ssrn.4244720
David Castells-Rufas
{"title":"GPU acceleration of Levenshtein distance computation between long strings","authors":"David Castells-Rufas","doi":"10.2139/ssrn.4244720","DOIUrl":"https://doi.org/10.2139/ssrn.4244720","url":null,"abstract":"","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"91 1","pages":"103019"},"PeriodicalIF":1.4,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"80523791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Uphill resampling for particle filter and its implementation on graphics processing unit 粒子滤波的上坡重采样及其在图形处理器上的实现
IF 1.4 4区 计算机科学
Parallel Computing Pub Date : 2023-02-01 DOI: 10.1016/j.parco.2022.102994
Özcan Dülger , Halit Oğuztüzün , Mübeccel Demirekler
{"title":"Uphill resampling for particle filter and its implementation on graphics processing unit","authors":"Özcan Dülger ,&nbsp;Halit Oğuztüzün ,&nbsp;Mübeccel Demirekler","doi":"10.1016/j.parco.2022.102994","DOIUrl":"https://doi.org/10.1016/j.parco.2022.102994","url":null,"abstract":"<div><p>We introduce a new resampling method, named Uphill, that is free from numerical instability and suitable for parallel implementation on graphics processing unit (GPU). Common resampling algorithms such as Systematic suffer from numerical instability when single precision floating point numbers are used. This is due to cumulative summation over the weights of particles when the weights differ widely or the number of particles is large. The Metropolis and Rejection resampling algorithms do not suffer from numerical instability as they only calculate the ratios of weights pairwise rather than perform collective operations over the weights. They are more suitable for the GPU implementation of the particle filter. However, they undergo non-coalesced global memory access patterns which cause their speed deteriorate rapidly as the number of particles gets large. Uphill also does not suffer from numerical instability but, experiences the same non-coalesced global memory access problem with Metropolis and Rejection. We introduce its faster version named Uphill-Fast which eliminates this problem. We make comparisons of Uphill and Uphill-Fast with the Systematic, Metropolis, Metropolis-C2 and Rejection resampling methods with respect to quality and speed. We also compare them on a highly non-linear system. Uphill-Fast runs faster and attains similar quality, in terms of RMSE, in comparison with Metropolis and Rejection when the number of particles is very large. Uphill-Fast runs with roughly same speed as Metropolis-C2 with better variance and MSE when the number of particles is very large.</p></div>","PeriodicalId":54642,"journal":{"name":"Parallel Computing","volume":"115 ","pages":"Article 102994"},"PeriodicalIF":1.4,"publicationDate":"2023-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49702532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信