Workload Balancing Methodology for Data-Intensive Applications with Divisible Load

2011 23rd International Symposium on Computer Architecture and High Performance Computing Pub Date : 2011-10-26 DOI:10.1109/SBAC-PAD.2011.15

C. Rosas, A. Sikora, Josep Jorba, Eduardo César

{"title":"Workload Balancing Methodology for Data-Intensive Applications with Divisible Load","authors":"C. Rosas, A. Sikora, Josep Jorba, Eduardo César","doi":"10.1109/SBAC-PAD.2011.15","DOIUrl":null,"url":null,"abstract":"Data-intensive applications are those that explore, query, analyze, and, in general, process very large data sets. Generally in High Performance Computing (HPC), the main performance problem associated to these applications is the load unbalance or inefficient resources utilization. This paper proposes a methodology for improving performance of data-intensive applications based on performing multiple data partitions prior to the execution, and ordering the data chunks according to their processing times during the application execution. As a first step, we consider that a single execution includes multiple related explorations on the same data set. Consequently, we propose to monitor the processing of each exploration and use the data gathered to dynamically tune the performance of the application. The tuning parameters included in the methodology are the partition factor of the data set, the distribution of these data chunks, and the number of processing nodes to be used by the application. The methodology has been initially tested using the well-known bioinformatics tool BLAST, obtaining encouraging results (up to a 40% of improvement).","PeriodicalId":390734,"journal":{"name":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 23rd International Symposium on Computer Architecture and High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SBAC-PAD.2011.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

Data-intensive applications are those that explore, query, analyze, and, in general, process very large data sets. Generally in High Performance Computing (HPC), the main performance problem associated to these applications is the load unbalance or inefficient resources utilization. This paper proposes a methodology for improving performance of data-intensive applications based on performing multiple data partitions prior to the execution, and ordering the data chunks according to their processing times during the application execution. As a first step, we consider that a single execution includes multiple related explorations on the same data set. Consequently, we propose to monitor the processing of each exploration and use the data gathered to dynamically tune the performance of the application. The tuning parameters included in the methodology are the partition factor of the data set, the distribution of these data chunks, and the number of processing nodes to be used by the application. The methodology has been initially tested using the well-known bioinformatics tool BLAST, obtaining encouraging results (up to a 40% of improvement).

查看原文本刊更多论文

具有可分负载的数据密集型应用的工作负载平衡方法

数据密集型应用程序是那些探索、查询、分析和通常处理非常大的数据集的应用程序。通常在高性能计算(HPC)中，与这些应用程序相关的主要性能问题是负载不平衡或资源利用效率低下。本文提出了一种提高数据密集型应用程序性能的方法，该方法基于在执行之前执行多个数据分区，并根据应用程序执行期间的处理时间对数据块进行排序。作为第一步，我们认为单个执行包括对同一数据集的多个相关探索。因此，我们建议监控每次探索的处理过程，并使用收集到的数据动态地调优应用程序的性能。该方法中包含的调优参数是数据集的分区因子、这些数据块的分布以及应用程序要使用的处理节点的数量。该方法已经使用著名的生物信息学工具BLAST进行了初步测试，获得了令人鼓舞的结果(高达40%的改进)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 23rd International Symposium on Computer Architecture and High Performance Computing

自引率

0.00%

发文量