采用集中式存储的任务袋模型对大型数据阵列进行分布式排序

S. Vostokin, I. Bobyleva
{"title":"采用集中式存储的任务袋模型对大型数据阵列进行分布式排序","authors":"S. Vostokin, I. Bobyleva","doi":"10.18287/1613-0073-2019-2416-199-203","DOIUrl":null,"url":null,"abstract":"The article discusses the application of the bag of tasks programming model for the problem of sorting a large data array. The choice is determined by the generality of its algorithmic structure with various problems from the field of data analysis including correlation analysis, frequency analysis, and data indexation. The sorting algorithm is a blockby-block sorting, followed by the pairwise merging of the blocks. At the end of the sorting, the data in the blocks form an ordered sequence. The order of sorting and merging tasks is set by a static directed acyclic graph. The sorting algorithm is implemented using MPI library in C ++ language with centralized storing of data blocks on the manager process. A feature of the implementation is the transfer of blocks between the master and the worker MPI processes for each task. Experimental study confirmed the hypothesis that the intensive data exchange resulting from the centralized nature of the bag of task model does not lead to a loss of performance. The data processing model makes it possible to weaken the technical requirements for the software and hardware.","PeriodicalId":10486,"journal":{"name":"Collection of selected papers of the III International Conference on Information Technology and Nanotechnology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Using the bag-of-tasks model with centralized storage for distributed sorting of large data array\",\"authors\":\"S. Vostokin, I. Bobyleva\",\"doi\":\"10.18287/1613-0073-2019-2416-199-203\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The article discusses the application of the bag of tasks programming model for the problem of sorting a large data array. The choice is determined by the generality of its algorithmic structure with various problems from the field of data analysis including correlation analysis, frequency analysis, and data indexation. The sorting algorithm is a blockby-block sorting, followed by the pairwise merging of the blocks. At the end of the sorting, the data in the blocks form an ordered sequence. The order of sorting and merging tasks is set by a static directed acyclic graph. The sorting algorithm is implemented using MPI library in C ++ language with centralized storing of data blocks on the manager process. A feature of the implementation is the transfer of blocks between the master and the worker MPI processes for each task. Experimental study confirmed the hypothesis that the intensive data exchange resulting from the centralized nature of the bag of task model does not lead to a loss of performance. The data processing model makes it possible to weaken the technical requirements for the software and hardware.\",\"PeriodicalId\":10486,\"journal\":{\"name\":\"Collection of selected papers of the III International Conference on Information Technology and Nanotechnology\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Collection of selected papers of the III International Conference on Information Technology and Nanotechnology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.18287/1613-0073-2019-2416-199-203\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Collection of selected papers of the III International Conference on Information Technology and Nanotechnology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18287/1613-0073-2019-2416-199-203","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文讨论了任务包编程模型在大数据数组排序问题中的应用。这种选择是由其算法结构的通用性和数据分析领域的各种问题决定的,包括相关分析、频率分析和数据索引。排序算法是逐块排序,然后对块进行两两合并。在排序结束时,块中的数据形成有序序列。排序和合并任务的顺序由静态有向无环图设置。排序算法采用c++语言的MPI库实现,数据块集中存储在管理器进程中。该实现的一个特点是在每个任务的主MPI进程和工作MPI进程之间传输块。实验研究证实了由任务包模型的集中性导致的密集数据交换不会导致性能损失的假设。数据处理模型使得对软件和硬件的技术要求降低成为可能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Using the bag-of-tasks model with centralized storage for distributed sorting of large data array
The article discusses the application of the bag of tasks programming model for the problem of sorting a large data array. The choice is determined by the generality of its algorithmic structure with various problems from the field of data analysis including correlation analysis, frequency analysis, and data indexation. The sorting algorithm is a blockby-block sorting, followed by the pairwise merging of the blocks. At the end of the sorting, the data in the blocks form an ordered sequence. The order of sorting and merging tasks is set by a static directed acyclic graph. The sorting algorithm is implemented using MPI library in C ++ language with centralized storing of data blocks on the manager process. A feature of the implementation is the transfer of blocks between the master and the worker MPI processes for each task. Experimental study confirmed the hypothesis that the intensive data exchange resulting from the centralized nature of the bag of task model does not lead to a loss of performance. The data processing model makes it possible to weaken the technical requirements for the software and hardware.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信