分布式多任务计算的自优化计算划分算法

Huashan Yu, Yingnan Li, Xianguo Wu, Jian Xiao, Xiaoming Li
{"title":"分布式多任务计算的自优化计算划分算法","authors":"Huashan Yu, Yingnan Li, Xianguo Wu, Jian Xiao, Xiaoming Li","doi":"10.1109/ChinaGrid.2010.51","DOIUrl":null,"url":null,"abstract":"Many-task computing (MTC) is a practical paradigm for developing loosely coupled and complex scientific applications. In this paradigm, computation on a large dataset is decomposed into tasks that are expected to be executed in parallel with dynamically allocated computing resources. These tasks pass data via files, and each one is to execute an existing program on one dataset element. Task scheduling is a key issue to enable MTC on parallel platforms like large-scale clusters, Grids and Clouds. Current solutions mainly focus on maximizing the number of utilized parallel computing resources. This paper proposes a configurable MTC model that aims to minimize a MTC computation’s turnaround time cost with as few resources as possible. The primary strategy is to coalesce tasks with application-specific expertise into task-sequences, and assign tasks on granularity of task-sequences. Based on this model, a self-optimizing task partitioning algorithm has been devised for scheduling tasks in MTC. It separates task assignment from resource allocation, and makes a tradeoff between maximizing utilized resources, balancing workload and reducing computation-scheduling overhead. The algorithm has been implemented in Harmonia, which is a software platform developed by Peking University for enabling MTC on large-scale distributed platforms. Both the configurable MTC model and the self-optimizing task partitioning algorithm were evaluated with the genome alternative splicing application, and experimental results have proved the model’s practicability.","PeriodicalId":429657,"journal":{"name":"2010 Fifth Annual ChinaGrid Conference","volume":"145 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Self-Optimizing Computation Partitioning Algorithm for Distributed Many-Task Computing\",\"authors\":\"Huashan Yu, Yingnan Li, Xianguo Wu, Jian Xiao, Xiaoming Li\",\"doi\":\"10.1109/ChinaGrid.2010.51\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many-task computing (MTC) is a practical paradigm for developing loosely coupled and complex scientific applications. In this paradigm, computation on a large dataset is decomposed into tasks that are expected to be executed in parallel with dynamically allocated computing resources. These tasks pass data via files, and each one is to execute an existing program on one dataset element. Task scheduling is a key issue to enable MTC on parallel platforms like large-scale clusters, Grids and Clouds. Current solutions mainly focus on maximizing the number of utilized parallel computing resources. This paper proposes a configurable MTC model that aims to minimize a MTC computation’s turnaround time cost with as few resources as possible. The primary strategy is to coalesce tasks with application-specific expertise into task-sequences, and assign tasks on granularity of task-sequences. Based on this model, a self-optimizing task partitioning algorithm has been devised for scheduling tasks in MTC. It separates task assignment from resource allocation, and makes a tradeoff between maximizing utilized resources, balancing workload and reducing computation-scheduling overhead. The algorithm has been implemented in Harmonia, which is a software platform developed by Peking University for enabling MTC on large-scale distributed platforms. Both the configurable MTC model and the self-optimizing task partitioning algorithm were evaluated with the genome alternative splicing application, and experimental results have proved the model’s practicability.\",\"PeriodicalId\":429657,\"journal\":{\"name\":\"2010 Fifth Annual ChinaGrid Conference\",\"volume\":\"145 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Fifth Annual ChinaGrid Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ChinaGrid.2010.51\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Fifth Annual ChinaGrid Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ChinaGrid.2010.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

多任务计算(MTC)是开发松耦合和复杂科学应用程序的实用范例。在这个范例中,大型数据集上的计算被分解为任务,这些任务预计将与动态分配的计算资源并行执行。这些任务通过文件传递数据,每个任务都是在一个数据集元素上执行一个现有的程序。任务调度是在大规模集群、网格和云等并行平台上启用MTC的关键问题。当前的解决方案主要侧重于最大化所利用的并行计算资源的数量。本文提出了一个可配置的MTC模型,该模型的目标是在尽可能少的资源下最小化MTC计算的周转时间成本。主要策略是将具有特定应用程序专业知识的任务合并到任务序列中,并根据任务序列的粒度分配任务。在此模型的基础上,设计了一种自优化任务划分算法,用于MTC的任务调度。它将任务分配与资源分配分离开来,并在最大限度地利用资源、平衡工作负载和减少计算调度开销之间进行权衡。该算法已在Harmonia中实现,Harmonia是北京大学为实现大规模分布式平台上的MTC而开发的软件平台。将可配置MTC模型和自优化任务划分算法应用于基因组备选剪接,实验结果证明了该模型的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Self-Optimizing Computation Partitioning Algorithm for Distributed Many-Task Computing
Many-task computing (MTC) is a practical paradigm for developing loosely coupled and complex scientific applications. In this paradigm, computation on a large dataset is decomposed into tasks that are expected to be executed in parallel with dynamically allocated computing resources. These tasks pass data via files, and each one is to execute an existing program on one dataset element. Task scheduling is a key issue to enable MTC on parallel platforms like large-scale clusters, Grids and Clouds. Current solutions mainly focus on maximizing the number of utilized parallel computing resources. This paper proposes a configurable MTC model that aims to minimize a MTC computation’s turnaround time cost with as few resources as possible. The primary strategy is to coalesce tasks with application-specific expertise into task-sequences, and assign tasks on granularity of task-sequences. Based on this model, a self-optimizing task partitioning algorithm has been devised for scheduling tasks in MTC. It separates task assignment from resource allocation, and makes a tradeoff between maximizing utilized resources, balancing workload and reducing computation-scheduling overhead. The algorithm has been implemented in Harmonia, which is a software platform developed by Peking University for enabling MTC on large-scale distributed platforms. Both the configurable MTC model and the self-optimizing task partitioning algorithm were evaluated with the genome alternative splicing application, and experimental results have proved the model’s practicability.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信