Improving Performance of the Irregular Data Intensive Application with Small Computation Workload for CMPs

2011 40th International Conference on Parallel Processing Workshops Pub Date : 2011-09-13 DOI:10.1109/ICPPW.2011.7

Guan Zhimin, Fu Yinxia, Zheng Ninghan, Zhang Jianxun, Cai Min, Huang Yan, Tang Jie

{"title":"Improving Performance of the Irregular Data Intensive Application with Small Computation Workload for CMPs","authors":"Guan Zhimin, Fu Yinxia, Zheng Ninghan, Zhang Jianxun, Cai Min, Huang Yan, Tang Jie","doi":"10.1109/ICPPW.2011.7","DOIUrl":null,"url":null,"abstract":"The data needs of scientific or commercial applications from a diverse range of fields have been increasing exponentially over the recent years. Although the traditional systems work well for computation that requires limited data handling, the CMPs in cloud computing may below performance for the computation that requires large amounts of intensive data. Conventional helper thread techniques try to improve the high performance overheads, but they can not improve performance of the irregular data intensive applications with small computation workload. Our goal is to provide a novel solution to improve the application performance in data intensive computing environments. By introducing the prepuce look ahead Size K, the prepush block size P and the synchronization block size B three operations to helper thread, we expect to reduce the overheads introduced by the traditional helper thread and leave the computing resources to perform useful prefetch work. As a starting point, we design the KPB interleaved data prepush algorithm, and use Q6600 and IBM 5110 multi-core computers as our test platforms to study behaviors of the benchmarks fromSPEC2006 suite and Olden suite. We construct the helper threads of mcf from SPEC2006, mst and em3d from Olden by using our method, the average result of speedup is 1.23, 1.32and 1.09 separately on the Q6600 machine, and 1.28, 1.35 and1.23 separately on another machine. Compared with the AP and PV methods, our method is less negative impact than both AP and PV, our KPB-method is also better than AP and PV in the prefetching timeliness and control ability.","PeriodicalId":173271,"journal":{"name":"2011 40th International Conference on Parallel Processing Workshops","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 40th International Conference on Parallel Processing Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPPW.2011.7","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The data needs of scientific or commercial applications from a diverse range of fields have been increasing exponentially over the recent years. Although the traditional systems work well for computation that requires limited data handling, the CMPs in cloud computing may below performance for the computation that requires large amounts of intensive data. Conventional helper thread techniques try to improve the high performance overheads, but they can not improve performance of the irregular data intensive applications with small computation workload. Our goal is to provide a novel solution to improve the application performance in data intensive computing environments. By introducing the prepuce look ahead Size K, the prepush block size P and the synchronization block size B three operations to helper thread, we expect to reduce the overheads introduced by the traditional helper thread and leave the computing resources to perform useful prefetch work. As a starting point, we design the KPB interleaved data prepush algorithm, and use Q6600 and IBM 5110 multi-core computers as our test platforms to study behaviors of the benchmarks fromSPEC2006 suite and Olden suite. We construct the helper threads of mcf from SPEC2006, mst and em3d from Olden by using our method, the average result of speedup is 1.23, 1.32and 1.09 separately on the Q6600 machine, and 1.28, 1.35 and1.23 separately on another machine. Compared with the AP and PV methods, our method is less negative impact than both AP and PV, our KPB-method is also better than AP and PV in the prefetching timeliness and control ability.

查看原文本刊更多论文

提高cmp不规则数据密集型应用的小计算量性能

近年来，来自不同领域的科学或商业应用的数据需求呈指数级增长。尽管传统系统对于需要有限数据处理的计算工作得很好，但云计算中的cmp对于需要大量密集数据的计算可能会表现不佳。传统的helper线程技术试图改善高性能开销，但对于计算工作量较小的不规则数据密集型应用程序，它们无法提高性能。我们的目标是提供一种新颖的解决方案，以提高数据密集型计算环境中的应用程序性能。通过向辅助线程引入预取大小K、预推块大小P和同步块大小B三种操作，我们期望减少传统辅助线程带来的开销，并将计算资源留给有用的预取工作。在此基础上，设计了KPB交错数据预推算法，并以Q6600和IBM 5110多核计算机为测试平台，研究了spec2006套件和Olden套件基准测试的性能。我们使用我们的方法构建了SPEC2006中的mcf、Olden中的mst和em3d的helper线程，在Q6600机器上的平均加速结果分别为1.23、1.32和1.09，在另一台机器上的平均加速结果分别为1.28、1.35和1.23。与AP法和PV法相比，我们的方法比AP法和PV法的负面影响都小，我们的kpb法在预取及时性和控制能力上也优于AP法和PV法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 40th International Conference on Parallel Processing Workshops

自引率

0.00%

发文量