{"title":"Using the bag-of-tasks model with centralized storage for distributed sorting of large data array","authors":"S. Vostokin, I. Bobyleva","doi":"10.18287/1613-0073-2019-2416-199-203","DOIUrl":null,"url":null,"abstract":"The article discusses the application of the bag of tasks programming model for the problem of sorting a large data array. The choice is determined by the generality of its algorithmic structure with various problems from the field of data analysis including correlation analysis, frequency analysis, and data indexation. The sorting algorithm is a blockby-block sorting, followed by the pairwise merging of the blocks. At the end of the sorting, the data in the blocks form an ordered sequence. The order of sorting and merging tasks is set by a static directed acyclic graph. The sorting algorithm is implemented using MPI library in C ++ language with centralized storing of data blocks on the manager process. A feature of the implementation is the transfer of blocks between the master and the worker MPI processes for each task. Experimental study confirmed the hypothesis that the intensive data exchange resulting from the centralized nature of the bag of task model does not lead to a loss of performance. The data processing model makes it possible to weaken the technical requirements for the software and hardware.","PeriodicalId":10486,"journal":{"name":"Collection of selected papers of the III International Conference on Information Technology and Nanotechnology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Collection of selected papers of the III International Conference on Information Technology and Nanotechnology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18287/1613-0073-2019-2416-199-203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The article discusses the application of the bag of tasks programming model for the problem of sorting a large data array. The choice is determined by the generality of its algorithmic structure with various problems from the field of data analysis including correlation analysis, frequency analysis, and data indexation. The sorting algorithm is a blockby-block sorting, followed by the pairwise merging of the blocks. At the end of the sorting, the data in the blocks form an ordered sequence. The order of sorting and merging tasks is set by a static directed acyclic graph. The sorting algorithm is implemented using MPI library in C ++ language with centralized storing of data blocks on the manager process. A feature of the implementation is the transfer of blocks between the master and the worker MPI processes for each task. Experimental study confirmed the hypothesis that the intensive data exchange resulting from the centralized nature of the bag of task model does not lead to a loss of performance. The data processing model makes it possible to weaken the technical requirements for the software and hardware.