Gokul Madathupalyam Chinnappan , Bharadwaj Veeravalli , Koen Mouthaan , John Wen-Hao Lee
{"title":"Experimental evaluation of a multi-installment scheduling strategy based on divisible load paradigm for SAR image reconstruction on a distributed computing infrastructure","authors":"Gokul Madathupalyam Chinnappan , Bharadwaj Veeravalli , Koen Mouthaan , John Wen-Hao Lee","doi":"10.1016/j.jpdc.2024.104942","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104942","url":null,"abstract":"<div><p>Radar loads, especially Synthetic Aperture Radar (SAR) image reconstruction loads use a large volume of data collected from satellites to create a high-resolution image of the earth. To design near-real-time applications that utilise SAR data, speeding up the image reconstruction algorithm is imperative. This can be achieved by deploying a set of distributed computing infrastructures connected through a network. Scheduling such complex and large divisible loads on a distributed platform can be designed using the Divisible Load Theory (DLT) framework. We performed distributed SAR image reconstruction experiments using the SLURM library on a cloud virtual machine network using two scheduling strategies, namely the Multi-Installment Scheduling with Result Retrieval (MIS-RR) strategy and the traditional EQual-partitioning Strategy (EQS). The DLT model proposed in the MIS-RR strategy is incorporated to make the load divisible. Based on the experimental results and performance analysis carried out using different pixel lengths, pulse set sizes, and the number of virtual machines, we observe that the time performance of MIS-RR is much superior to that of EQS. Hence the MIS-RR strategy is of practical significance in reducing the overall processing time, and cost, and in improving the utilisation of the compute infrastructure. Furthermore, we note that the DLT-based theoretical analysis of MIS-RR coincides well with the experimental data, demonstrating the relevance of DLT in the real world.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104942"},"PeriodicalIF":3.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141582082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yashar Naderzadeh , Daniel Grosu , Ratna Babu Chinnam
{"title":"PPB-MCTS: A novel distributed-memory parallel partial-backpropagation Monte Carlo tree search algorithm","authors":"Yashar Naderzadeh , Daniel Grosu , Ratna Babu Chinnam","doi":"10.1016/j.jpdc.2024.104944","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104944","url":null,"abstract":"<div><p>Monte-Carlo Tree Search (MCTS) is an adaptive and heuristic tree-search algorithm designed to uncover sub-optimal actions at each decision-making point. This method progressively constructs a search tree by gathering samples throughout its execution. Predominantly applied within the realm of gaming, MCTS has exhibited exceptional achievements. Additionally, it has displayed promising outcomes when employed to solve NP-hard combinatorial optimization problems. MCTS has been adapted for distributed-memory parallel platforms. The primary challenges associated with distributed-memory parallel MCTS are the substantial communication overhead and the necessity to balance the computational load among various processes. In this work, we introduce a novel distributed-memory parallel MCTS algorithm with partial backpropagations, referred to as <em>Parallel Partial-Backpropagation MCTS</em> (<span>PPB-MCTS</span>). Our design approach aims to significantly reduce the communication overhead while maintaining, or even slightly improving, the performance in the context of combinatorial optimization problems. To address the communication overhead challenge, we propose a strategy involving transmitting an additional backpropagation message. This strategy avoids attaching an information table to the communication messages exchanged by the processes, thus reducing the communication overhead. Furthermore, this approach contributes to enhancing the decision-making accuracy during the selection phase. The load balancing issue is also effectively addressed by implementing a shared transposition table among the parallel processes. Furthermore, we introduce two primary methods for managing duplicate states within distributed-memory parallel MCTS, drawing upon techniques utilized in addressing duplicate states within sequential MCTS. Duplicate states can transform the conventional search tree into a Directed Acyclic Graph (DAG). To evaluate the performance of our proposed parallel algorithm, we conduct an extensive series of experiments on solving instances of the Job-Shop Scheduling Problem (JSSP) and the Weighted Set-Cover Problem (WSCP). These problems are recognized for their complexity and classified as NP-hard combinatorial optimization problems with considerable relevance within industrial applications. The experiments are performed on a cluster of computers with many cores. The empirical results highlight the enhanced scalability of our algorithm compared to that of the existing distributed-memory parallel MCTS algorithms. As the number of processes increases, our algorithm demonstrates increased rollout efficiency while maintaining an improved load balance across processes.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104944"},"PeriodicalIF":3.4,"publicationDate":"2024-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141480204","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yangyang Long , Changgen Peng , Weijie Tan , Yuling Chen
{"title":"Blockchain-assisted full-session key agreement for secure data sharing in cloud computing","authors":"Yangyang Long , Changgen Peng , Weijie Tan , Yuling Chen","doi":"10.1016/j.jpdc.2024.104943","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104943","url":null,"abstract":"<div><p>Data sharing in cloud computing allows multiple data owners to freely share their data resources while security and privacy issues remain inevitable challenges. As a foundation of secure communication, authenticated key agreement (AKA) scheme has been recognized as a promising approach to solve such problems. However, most existing AKA schemes are based on the cloud-based architecture, privacy and security issues will inevitably occur once the centralized authority is attacked. Besides, most previous schemes require an online registration authority for authentication, which may consume significant resources. To address these drawbacks, for secure data sharing in cloud computing, a blockchain-assisted full-session key agreement scheme is proposed. After the registration phase, the registration authority does not engage in authentication and key agreement process. By utilizing blockchain technology, a common session key between the remote user and cloud server can be negotiated, and a shared group key among multiple remote users can be negotiated without private information leakage. Formal and informal security proof demonstrated the proposed scheme is able to meet the security and privacy requirements. The detail performance evaluation shows that the proposed scheme has lower computation costs and acceptable communication overheads while superior security is ensured.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104943"},"PeriodicalIF":3.4,"publicationDate":"2024-06-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141480203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Francesco Sgherzi , Marco Siracusa , Ivan Fernandez , Adrià Armejach , Miquel Moretó
{"title":"SpChar: Characterizing the sparse puzzle via decision trees","authors":"Francesco Sgherzi , Marco Siracusa , Ivan Fernandez , Adrià Armejach , Miquel Moretó","doi":"10.1016/j.jpdc.2024.104941","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104941","url":null,"abstract":"<div><p>Sparse matrix computation is crucial in various modern applications, including large-scale graph analytics, deep learning, and recommender systems. The performance of sparse kernels varies greatly depending on the structure of the input matrix, making it difficult to gain a comprehensive understanding of sparse computation and its relationship to inputs, algorithms, and target machine architecture. Despite extensive research on certain sparse kernels, such as Sparse Matrix-Vector Multiplication (SpMV), the overall family of sparse algorithms has yet to be investigated as a whole. This paper introduces SpChar, a workload characterization methodology for general sparse computation. SpChar employs tree-based models to identify the most relevant hardware and input characteristics, starting from hardware and input-related metrics gathered from Performance Monitoring Counters (PMCs) and matrices. Our analysis enables the creation of a <em>characterization loop</em> that facilitates the optimization of sparse computation by mapping the impact of architectural features to inputs and algorithmic choices. We apply SpChar to more than 600 matrices from the SuiteSparse Matrix collection and three state-of-the-art Arm Central Processing Units (CPUs) to determine the critical hardware and software characteristics that affect sparse computation. In our analysis, we determine that the biggest limiting factors for high-performance sparse computation are (1) the latency of the memory system, (2) the pipeline flush overhead resulting from branch misprediction, and (3) the poor reuse of cached elements. Additionally, we propose software and hardware optimizations that designers can implement to create a platform suitable for sparse computation. We then investigate these optimizations using the gem5 simulator to achieve a significant speedup of up to 2.63× compared to a CPU where the optimizations are not applied.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104941"},"PeriodicalIF":3.4,"publicationDate":"2024-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141433984","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Huifeng Yuan , Lijing Cheng , Yuying Pan , Zhetao Tan , Qian Liu , Zhong Jin
{"title":"A multi-level parallel approach to increase the computation efficiency of a global ocean temperature dataset reconstruction","authors":"Huifeng Yuan , Lijing Cheng , Yuying Pan , Zhetao Tan , Qian Liu , Zhong Jin","doi":"10.1016/j.jpdc.2024.104938","DOIUrl":"10.1016/j.jpdc.2024.104938","url":null,"abstract":"<div><p>There is an increasing need to provide real-time datasets for climate monitoring and applications. However, the current data products from all international groups have at least a month delay for data release. One reason for this delay is the long computing time of the global reconstruction algorithm (so-called mapping approach). To tackle this issue, this paper proposes a multi-level parallel computing model to improve the efficiency of data construction by parallelization of computation, reducing code branch prediction, optimizing data spatial locality, cache utilization, and other measures. This model has been applied to a mapping approach proposed by the Institute of Atmospheric Physics (IAP), one of the world's most widely used data products in the ocean and climate field. Compared with the traditional serial construction of MATLAB-based scheme on a single node, the speed of the construction after parallel optimizations is speeded up by ∼4.7 times. A large-scale parallel experiment of a long-term (∼1000 months) gridded dataset utilizing over 16,000 processor cores proves the model's scalability, improving ∼1200 times. In summary, this new model represents another example of the application of high-performance computing in oceanography and climatology.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104938"},"PeriodicalIF":3.8,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141405481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Using hardware-transactional-memory support to implement speculative task execution","authors":"Juan Salamanca , Alexandro Baldassin","doi":"10.1016/j.jpdc.2024.104939","DOIUrl":"10.1016/j.jpdc.2024.104939","url":null,"abstract":"<div><p>Loops take up most of the time of computer programs, so optimizing them so that they run in the shortest time possible is a continuous task. However, this task is not negligible; on the contrary, it is an open area of research since many irregular loops are hard to parallelize. Generally, these loops have loop-carried (DOACROSS) dependencies and the appearance of dependencies could depend on the context. Many techniques have been studied to be able to parallelize these loops efficiently; however, for example in the OpenMP standard there is no efficient way to parallelize them. This article presents Speculative Task Execution (STE), a technique that enables the execution of OpenMP tasks in a speculative way to accelerate certain hot-code regions (such as loops) marked by OpenMP directives. It also presents a detailed analysis of the application of Hardware Transactional Memory (HTM) support for executing tasks speculatively and describes a careful evaluation of the implementation of STE using HTM on modern machines. In particular, we consider the scenario in which speculative tasks are generated by the OpenMP <span>taskloop</span> construct (<em>Speculative Taskloop (STL)</em>). As a result, it provides evidence to support several important claims about the performance of STE over HTM in modern processor architectures. Experimental results reveal that: (a) by implementing STL on top of HTM for hot-code regions, speed-ups of up to 5.39× can be obtained in IBM POWER8 and of up to 2.41× in Intel processors using 4 cores; and (b) STL-ROT, a variant of STL using rollback-only transactions (ROTs), achieves speed-ups of up to 17.70× in IBM POWER9 processor using 20 cores.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104939"},"PeriodicalIF":3.4,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141411270","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Daniel Wladdimiro , Luciana Arantes , Pierre Sens , Nicolás Hidalgo
{"title":"PA-SPS: A predictive adaptive approach for an elastic stream processing system","authors":"Daniel Wladdimiro , Luciana Arantes , Pierre Sens , Nicolás Hidalgo","doi":"10.1016/j.jpdc.2024.104940","DOIUrl":"10.1016/j.jpdc.2024.104940","url":null,"abstract":"<div><p>Stream Processing Systems (SPSs) dynamically process input events. Since the input is usually not a constant flow, presenting rate fluctuations, many works in the literature propose to dynamically replicate SPS operators, aiming at reducing the processing bottleneck induced by such fluctuations. However, these SPSs do not consider the problem of load balancing of the replicas or the cost involved in reconfiguring the system whenever the number of replicas changes. We present in this paper a predictive model which, based on input rate variation, execution time of operators, and queued events, dynamically defines the necessary current number of replicas of each operator. A predictor, composed of different models (i.e., mathematical and Machine Learning ones), predicts the input rate. We also propose a Storm-based SPS, named PA-SPS, which uses such a predictive model, not requiring reboot reconfiguration when the number of operators replica change. PA-SPS also implements a load balancer that distributes incoming events evenly among replicas of an operator. We have conducted experiments on Google Cloud Platform (GCP) for evaluation PA-SPS using real traffic traces of different applications and also compared it with Storm and other existing SPSs.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104940"},"PeriodicalIF":3.8,"publicationDate":"2024-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141401072","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Umer Zukaib , Xiaohui Cui , Chengliang Zheng , Dong Liang , Salah Ud Din
{"title":"Meta-Fed IDS: Meta-learning and Federated learning based fog-cloud approach to detect known and zero-day cyber attacks in IoMT networks","authors":"Umer Zukaib , Xiaohui Cui , Chengliang Zheng , Dong Liang , Salah Ud Din","doi":"10.1016/j.jpdc.2024.104934","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104934","url":null,"abstract":"<div><p>The Internet of Medical Things (IoMT) is a transformative fusion of medical sensors, equipment, and the Internet of Things, positioned to transform healthcare. However, security and privacy concerns hinder widespread IoMT adoption, intensified by the scarcity of high-quality datasets for developing effective security solutions. Addressing these challenges, we propose a novel framework for cyberattack detection in dynamic IoMT networks. This framework integrates Federated Learning with Meta-learning, employing a multi-phase architecture for identifying known attacks, and incorporates advanced clustering and biased classifiers to address zero-day attacks. The framework's deployment is adaptable to dynamic and diverse environments, utilizing an Infrastructure-as-a-Service (IaaS) model on the cloud and a Software-as-a-Service (SaaS) model on the fog end. To reflect real-world scenarios, we introduce a specialized IoMT dataset. Our experimental results indicate high accuracy and low misclassification rates, demonstrating the framework's capability in detecting cyber threats in complex IoMT environments. This approach shows significant promise in bolstering cybersecurity in advanced healthcare technologies.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104934"},"PeriodicalIF":3.8,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141302619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"DRACO: Distributed Resource-aware Admission Control for large-scale, multi-tier systems","authors":"Domenico Cotroneo, Roberto Natella, Stefano Rosiello","doi":"10.1016/j.jpdc.2024.104935","DOIUrl":"https://doi.org/10.1016/j.jpdc.2024.104935","url":null,"abstract":"<div><p>Modern distributed systems are designed to manage overload conditions, by throttling the traffic in excess that cannot be served through <em>overload control</em> techniques. However, the adoption of large-scale NoSQL datastores make systems vulnerable to <em>unbalanced overloads</em>, where specific datastore nodes are overloaded because of hot-spot resources and hogs. In this paper, we propose DRACO, a novel overload control solution that is aware of data dependencies between the application and the datastore tiers. DRACO performs selective admission control of application requests, by only dropping the ones that map to resources on overloaded datastore nodes, while achieving high resource utilization on non-overloaded datastore nodes. We evaluate DRACO on two case studies with high availability and performance requirements, a virtualized IP Multimedia Subsystem and a distributed fileserver. Results show that the solution can achieve high performance and resource utilization even under extreme overload conditions, up to 100x the engineered capacity.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"192 ","pages":"Article 104935"},"PeriodicalIF":3.8,"publicationDate":"2024-06-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000996/pdfft?md5=47aadc5c325c36c8ff181fd763795f30&pid=1-s2.0-S0743731524000996-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141291810","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Front Matter 1 - Full Title Page (regular issues)/Special Issue Title page (special issues)","authors":"","doi":"10.1016/S0743-7315(24)00094-7","DOIUrl":"https://doi.org/10.1016/S0743-7315(24)00094-7","url":null,"abstract":"","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"191 ","pages":"Article 104930"},"PeriodicalIF":3.8,"publicationDate":"2024-06-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S0743731524000947/pdfft?md5=2a0c1e248048475ac142cf8a9af19128&pid=1-s2.0-S0743731524000947-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141240483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}