面向I/O密集型应用的任务池并行I/O范式

Jianjiang Li, Lin Yan, Zhe Gao, D. Hei
{"title":"面向I/O密集型应用的任务池并行I/O范式","authors":"Jianjiang Li, Lin Yan, Zhe Gao, D. Hei","doi":"10.1109/ISPA.2009.20","DOIUrl":null,"url":null,"abstract":"In regards to applications like 3D seismic migration, it is quite important to improve the I/O performance within an cluster computing system. Such seismic data processing applications are the I/O intensive applications. For example, large 3D data volume cannot be hold totally in computer memories. Therefore the input data files have to be divided into many fine-grained chunks. Intermediate results are written out at various stages during the execution, and final results are written out by the master process. This paper describes a novel manner for optimizing the parallel I/O data access strategy and load balancing for the above-mentioned particular program model. The optimization, based on the application defined API, reduces the number of I/O operations and communication (as compared to the original model). This is done by forming groups of threads with \"group roots\", so to speak, that read input data (determined by an index retrieved from the master process) and then send it to their group members. In the original model, each process/thread reads the whole input data and outputs its own results. Moreover the loads are balanced, for the on-line dynamic scheduling of access request to process the migration data. Finally, in the actual performance test, the improvement of performance is often more than 60% by comparison with the original model.","PeriodicalId":346815,"journal":{"name":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","volume":"53 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"A Task-Pool Parallel I/O Paradigm for an I/O Intensive Application\",\"authors\":\"Jianjiang Li, Lin Yan, Zhe Gao, D. Hei\",\"doi\":\"10.1109/ISPA.2009.20\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In regards to applications like 3D seismic migration, it is quite important to improve the I/O performance within an cluster computing system. Such seismic data processing applications are the I/O intensive applications. For example, large 3D data volume cannot be hold totally in computer memories. Therefore the input data files have to be divided into many fine-grained chunks. Intermediate results are written out at various stages during the execution, and final results are written out by the master process. This paper describes a novel manner for optimizing the parallel I/O data access strategy and load balancing for the above-mentioned particular program model. The optimization, based on the application defined API, reduces the number of I/O operations and communication (as compared to the original model). This is done by forming groups of threads with \\\"group roots\\\", so to speak, that read input data (determined by an index retrieved from the master process) and then send it to their group members. In the original model, each process/thread reads the whole input data and outputs its own results. Moreover the loads are balanced, for the on-line dynamic scheduling of access request to process the migration data. Finally, in the actual performance test, the improvement of performance is often more than 60% by comparison with the original model.\",\"PeriodicalId\":346815,\"journal\":{\"name\":\"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications\",\"volume\":\"53 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2009-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPA.2009.20\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 IEEE International Symposium on Parallel and Distributed Processing with Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPA.2009.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

对于像3D地震迁移这样的应用程序,提高集群计算系统中的I/O性能是非常重要的。这种地震数据处理应用程序是I/O密集型应用程序。例如,大的3D数据量不能完全保存在计算机存储器中。因此,必须将输入数据文件划分为许多细粒度的块。中间结果在执行的各个阶段被写出来,最终结果由主进程写出来。本文针对上述特定的程序模型,提出了一种优化并行I/O数据访问策略和负载均衡的新方法。基于应用程序定义的API的优化减少了I/O操作和通信的数量(与原始模型相比)。这是通过形成具有“组根”的线程组来实现的,也就是说,线程组读取输入数据(由从主进程检索的索引确定),然后将其发送给它们的组成员。在原始模型中,每个进程/线程读取整个输入数据并输出自己的结果。并且负载均衡,实现访问请求的在线动态调度,处理迁移数据。最后,在实际性能测试中,与原模型相比,性能的提高往往在60%以上。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Task-Pool Parallel I/O Paradigm for an I/O Intensive Application
In regards to applications like 3D seismic migration, it is quite important to improve the I/O performance within an cluster computing system. Such seismic data processing applications are the I/O intensive applications. For example, large 3D data volume cannot be hold totally in computer memories. Therefore the input data files have to be divided into many fine-grained chunks. Intermediate results are written out at various stages during the execution, and final results are written out by the master process. This paper describes a novel manner for optimizing the parallel I/O data access strategy and load balancing for the above-mentioned particular program model. The optimization, based on the application defined API, reduces the number of I/O operations and communication (as compared to the original model). This is done by forming groups of threads with "group roots", so to speak, that read input data (determined by an index retrieved from the master process) and then send it to their group members. In the original model, each process/thread reads the whole input data and outputs its own results. Moreover the loads are balanced, for the on-line dynamic scheduling of access request to process the migration data. Finally, in the actual performance test, the improvement of performance is often more than 60% by comparison with the original model.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信