{"title":"Sleptsov Net Computing resolves problems of modern supercomputing revealed by Jack Dongarra in his Turing Award talk in November 2022","authors":"D. Zaitsev","doi":"10.1080/17445760.2023.2201002","DOIUrl":"https://doi.org/10.1080/17445760.2023.2201002","url":null,"abstract":"In his Turing Award Lecture, Jack Dongarra revealed a drastic problem of modern HPC – low efficiency on real-life task mixture, 0.8% for the best supercomputer Frontier. Born in Ukraine paradigm of Sleptsov net computing resolves this problem with the computing memory hardware implementation of an entirely graphical language of concurrent programming supplied with framework of formal methods for verification of concurrent programs. GRAPHICAL ABSTRACT","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44992770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lorenzo Ariemma, N. De Carlo, Diego Pennino, M. Pizzonia, A. Vitaletti, Marco Zecchini
{"title":"Blockchain for the supply chain of the Italian craft beer sector: tracking and discount coupons","authors":"Lorenzo Ariemma, N. De Carlo, Diego Pennino, M. Pizzonia, A. Vitaletti, Marco Zecchini","doi":"10.1080/17445760.2023.2190974","DOIUrl":"https://doi.org/10.1080/17445760.2023.2190974","url":null,"abstract":"","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49431822","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Achieving maximum speedup in multi-level acceleration for massive coronavirus testing","authors":"Keqin Li, Bo Yang","doi":"10.1080/17445760.2023.2190975","DOIUrl":"https://doi.org/10.1080/17445760.2023.2190975","url":null,"abstract":"It is well and widely known that sample pooling could provide an effective and efficient way for fast coronavirus testing among massive asymptomatic individuals. The method of multi-level acceleration for asymptomatic COVID-19 screening has been introduced, and for one and two levels, the optimal group sizes have been obtained. However, there are still multiple challenges. First, it is not clear how to find the optimal group sizes for three or more levels. Second, there is lack of closed-form expressions for the optimal group sizes for two or more levels. Third, it is not clear how to determine the optimal number of levels. And last, it is not known what the maximum achievable speedup is. The motivation of this paper is to address all the above challenges. The optimization of a hierarchical pooling strategy includes its number of levels and the group size of each level. In this paper, based on multi-variable optimization and Taylor approximation, we are able to derive closed-form expressions for the optimal number of levels , the optimal group sizes , ,…, , and the maximum possible speedup of a hierarchical pooling strategy of , where is the fraction of infected people. The above speedup is nearly a linear function of the reciprocal of , in the sense that it is asymptotically greater than any sub-linear function of the reciprocal of for any small . Using the results in this paper, we can quickly and easily predict the performance of an optimal hierarchical pooling strategy. For instance, if the fraction of infected people is 0.0001, an 8-level hierarchical pooling strategy can achieve speedup of nearly 400.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45813071","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Position control of a ball balancer system using Particle Swarm Optimization, BAT and Flower Pollination Algorithm","authors":"Ajit Kumar Sharma, B. Bhushan","doi":"10.1080/17445760.2023.2190972","DOIUrl":"https://doi.org/10.1080/17445760.2023.2190972","url":null,"abstract":"ABSTRACT The design and control of the 2DoF Ball Balancer system is presented in this work. The ball balancer is a feedback-based underactuated system that is nonlinear, multivariate, and electromechanical. The proportional derivative (PD) controller is optimized by using Bat Algorithm, Particle Swarm Optimization, and Flower Pollination Algorithm in this research. By regulating the plate inclination angle, the suggested controller accomplishes self-balancing control for a ball on the plate. The modelling of the ball balancer system is accomplished using a 2DoF ball balancer system. In addition, Bat Algorithms, Particle Swarm Optimization, and the Flower Pollination Algorithm are used to analyze the state of a process autonomously. The system's model is created using MATLAB/Simulink approaches, and the results present the system with a steady and controllable output for ball balancing and plate angle control. Graphical abstract The author control the position of the ball balancer by using the PD controller and optimized the parameter of the controller through FPA, BA, and PSO.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46556495","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Design of a modern fast Fourier transform and cache effective bit-reversal algorithm","authors":"Adam Simek, I. Šimeček","doi":"10.1080/17445760.2023.2179049","DOIUrl":"https://doi.org/10.1080/17445760.2023.2179049","url":null,"abstract":"ABSTRACT This article deals with efficient vectorization of the fast Fourier transform algorithm while focusing on Cooley–Tukey versions with power-of-two radixes. Aside from examples of optimizations for 256 and 512-bit vectors, this work also discusses relations between individual radix-based versions, vectorization and OpenMP threading. Ideas are progressing into a timeless design of the FFT algorithm, which can work with any vector size and radix version through conversion into radix-2 output permutation. Furthermore, the implementation of the Cache Optimized Bit-Reversal algorithm, which doubles the performance of its predecessor, is introduced.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-03-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45621885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Big data processing with 1D-Crosspoint Arrays","authors":"Taeyoung An, A. Oruç","doi":"10.1080/17445760.2023.2172574","DOIUrl":"https://doi.org/10.1080/17445760.2023.2172574","url":null,"abstract":"Increased chip densities offer massive computation power to deal with fundamental big data operations such as searching and sorting. At the same time, the proliferation of processing elements (PEs) in such multicore chips together with the employment of more aggressive parallel algorithms cause the amount of space needed for interprocessor communications to dominate the overall chip space, potentially resulting in reduced computational efficiency. To overcome this issue, this paper introduces a new architecture that uses simple crosspoint switches to pair PEs instead of a complex interconnection network. This new architecture may be viewed as a ‘quadratic’ array of processors as it uses PEs rather than PEs as in linear array processor models. The switches between adjacent PEs are created using a cyclic permutation wiring idea with PEs and as many crosspoints. We demonstrate the versatility of this new parallel architecture by designing fast algorithms to sort and search a list of n elements with it. In particular, we show that finding a minimum, maximum, and searching a list of n elements can all be performed on this parallel architecture in time with additional elementary logic gates with fan-in and in time with fan-in. We further show that sorting a list of n elements can also be carried out in time using additional elementary logic gates with fan-in and threshold logic gates on the same parallel architecture. The sorting time increases to if only elementary logic gates with fan-in are used. In addition, we establish how similar queries can be handled within the same order of time complexities. We use this new parallel architecture to perform sorting and searching on big data on three different models. The first of these models provides an efficient implementation of enumeration sorting and searching for moderate size big data sets. The second model offers increased parallelism by replication of the new parallel architecture but its hardware complexity limits its use to moderate size big data sets as well. The third model removes this limitation by introducing a tradeoff parameter between the time and hardware complexity of the overall computation, thereby providing an optimal use of available resources within a given chip-set space.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44546716","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"VSCT algorithm for graph partitioning based on volume, size, cuts and time","authors":"Chayma Sakouhi, Abir Khaldi, H. Ghézala","doi":"10.1080/17445760.2023.2174540","DOIUrl":"https://doi.org/10.1080/17445760.2023.2174540","url":null,"abstract":"Dealing with large-scale graphs requires an efficient graph partitioner that produces balanced partitions with fewer cut edges/vertices in a reasonable amount of time. Despite several algorithms that have been proposed, it is still insufficient. Even with the continuous growth of graph volume, they do not consider the graph volume during graph partitioning. Therefore, these algorithms generate an imbalanced workload. We propose a graph partitioner algorithm VSCT based essentially on four key metrics: Volume, Size, Cuts, and Time to maintain high-quality graph partitioning. Using real-world datasets, we show that VSCT performs an efficient partitioning quality against the existing graph partitioning algorithms.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42068269","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yinhui Liu, Shurong Zhang, Lin Chen, Kan He, Weihua Yang
{"title":"The routing algorithms for maximum probability paths under degree constraints in networks","authors":"Yinhui Liu, Shurong Zhang, Lin Chen, Kan He, Weihua Yang","doi":"10.1080/17445760.2023.2175360","DOIUrl":"https://doi.org/10.1080/17445760.2023.2175360","url":null,"abstract":"ABSTRACT Driven by the rapid development of information technology, the network has been researched extensively and the efficient routing design has become particularly important and valuable. Since the data transmission in networks is mainly based on the establishment of communication, the instability of links and the capacity of nodes should be considered. Motivated by this, considering the routing optimization under the degree constraints, we formulate the problem of designing paths with maximum probability in the network from a single source node to all other nodes. Then we propose five polynomial-time algorithms for this problem by using the technical methods of bidirectional optimization, probability first strategy, the selection of the node with the maximum number of degrees and the restriction of the depth. In addition, the simulations and comparative analysis show that the algorithms have obvious advantages in practice.","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48313248","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The generalized measure of edge fault tolerance in exchanged 3-ary n-cube","authors":"Yayu Yang, Mingzu Zhang, J. Meng","doi":"10.1080/17445760.2023.2172575","DOIUrl":"https://doi.org/10.1080/17445760.2023.2172575","url":null,"abstract":"The exchanged 3-ary n-cube , proposed by Lv et al. in 2021, is obtained by removing edges from a 3-ary n-cube , where r + s + t + 1 = n. The topological interconnection network of a multiprocessor system can be modeled as a connected graph. Analyzing the fault tolerance of its topological structure is critical in the course of design and maintenance of it. Given a connected graph G, let F be an edge subset of G. F is called an h-edge-cut of G, if G−F is disconnected and each remaining component has the minimum degree of at least h. The h-edge-connectivity is the minimum cardinality of all h-edge-cuts of G. For and , in this paper, we determine the -edge-connectivity of exchanged 3-ary n-cubes, , and prove the exact values . GRAPHICAL ABSTRACT","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41820052","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimization strategies for GPUs: an overview of architectural approaches","authors":"Alessio Masola, Nicola Capodieci","doi":"10.1080/17445760.2023.2173752","DOIUrl":"https://doi.org/10.1080/17445760.2023.2173752","url":null,"abstract":"Modern Cyber Physical Systems (CPS) applications require hardware capable of optimized performance-per-watt efficency. This is usually obtained through massively parallel accelerators such as the GPU. Recent research is therefore investigating novel designs to optimize GPU energy consumption and performance for various applications in the Internet-of-things, autonomous navigation, and industrial robotics domains. This paper presents a survey of the current state-of-the-art approaches for optimizing GPU performance metrics; we present a complete and up-to-date summary of ideas, mechanisms, and potential improvements for next-generation GPU devices. GRAPHICAL ABSTRACT","PeriodicalId":45411,"journal":{"name":"International Journal of Parallel Emergent and Distributed Systems","volume":null,"pages":null},"PeriodicalIF":1.1,"publicationDate":"2023-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46677906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}