Pthreads Performance Characteristics on Shared Cache CMP, Private Cache CMP and SMP

2010 Second International Conference on Computer Engineering and Applications Pub Date : 2010-03-19 DOI:10.1109/ICCEA.2010.44

I. Tan, I. Chai, Poo Kuan Hoong

{"title":"Pthreads Performance Characteristics on Shared Cache CMP, Private Cache CMP and SMP","authors":"I. Tan, I. Chai, Poo Kuan Hoong","doi":"10.1109/ICCEA.2010.44","DOIUrl":null,"url":null,"abstract":"With the wide availability of chip multi-processing (CMP), software developers are now facing the task of effectively parallelizing their software code. Once they have identified the areas of parallelization, they will need to know the level of code granularity needed to ensure profitable execution. Furthermore, this problem multiplies itself with different hardware available. In this paper, we present a novel approach for fair comparison of the hardware configuration by simulation through configuring a pair of quad-core processors. The simulated configuration represents shared cache CMP, private cache CMP and symmetrical multiprocessor (SMP) environment. We then present a modified lmbench micro-benchmark suite to measure the cost of threading on these different hardware configurations. In our empirical studies, we observe that shared cache CMP exhibits better performance when the operating systems load balancer is highly active. However, the measurements also indicate that thread size is an important consideration where potential cache trashing can occur when sharing a cache between processing cores. Private cache CMP and SMP do not exhibit significant difference in our measurements. The techniques presented can be incorporated into integrated development environment, compilers and potentially even other run-time environments.","PeriodicalId":207234,"journal":{"name":"2010 Second International Conference on Computer Engineering and Applications","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Computer Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCEA.2010.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

With the wide availability of chip multi-processing (CMP), software developers are now facing the task of effectively parallelizing their software code. Once they have identified the areas of parallelization, they will need to know the level of code granularity needed to ensure profitable execution. Furthermore, this problem multiplies itself with different hardware available. In this paper, we present a novel approach for fair comparison of the hardware configuration by simulation through configuring a pair of quad-core processors. The simulated configuration represents shared cache CMP, private cache CMP and symmetrical multiprocessor (SMP) environment. We then present a modified lmbench micro-benchmark suite to measure the cost of threading on these different hardware configurations. In our empirical studies, we observe that shared cache CMP exhibits better performance when the operating systems load balancer is highly active. However, the measurements also indicate that thread size is an important consideration where potential cache trashing can occur when sharing a cache between processing cores. Private cache CMP and SMP do not exhibit significant difference in our measurements. The techniques presented can be incorporated into integrated development environment, compilers and potentially even other run-time environments.

查看原文本刊更多论文

共享缓存CMP、私有缓存CMP和SMP的Pthreads性能特征

随着芯片多处理(CMP)的广泛应用，软件开发人员现在面临着有效并行化其软件代码的任务。一旦他们确定了并行化的领域，他们将需要知道确保有效执行所需的代码粒度级别。此外，这个问题会随着可用硬件的不同而成倍增加。在本文中，我们提出了一种新颖的方法，通过配置一对四核处理器来模拟硬件配置的公平比较。模拟的配置包括共享缓存CMP、私有缓存CMP和对称多处理器(SMP)环境。然后，我们提供了一个改进的lmbench微基准套件来测量这些不同硬件配置上的线程成本。在我们的实证研究中，我们观察到，当操作系统负载平衡器高度活跃时，共享缓存CMP表现出更好的性能。然而，测量结果还表明，线程大小是一个重要的考虑因素，在处理内核之间共享缓存时，可能会发生潜在的缓存垃圾。私有缓存CMP和SMP在我们的测量中没有显着差异。本文介绍的技术可以集成到集成开发环境、编译器甚至其他运行时环境中。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Second International Conference on Computer Engineering and Applications

自引率

0.00%

发文量