超线程对生产应用程序中处理器资源利用率的影响

2011 18th International Conference on High Performance Computing Pub Date : 2011-12-18 DOI:10.1109/HiPC.2011.6152743

S. Saini, Haoqiang Jin, R. Hood, David Barker, P. Mehrotra, R. Biswas

{"title":"超线程对生产应用程序中处理器资源利用率的影响","authors":"S. Saini, Haoqiang Jin, R. Hood, David Barker, P. Mehrotra, R. Biswas","doi":"10.1109/HiPC.2011.6152743","DOIUrl":null,"url":null,"abstract":"Intel provides Hyper-Threading (HT) in processors based on its Pentium and Nehalem micro-architecture such as the Westmere-EP. HT enables two threads to execute on each core in order to hide latencies related to data access. These two threads can execute simultaneously, filling unused stages in the functional unit pipelines. To aid better understanding of HT-related issues, we collect Performance Monitoring Unit (PMU) data (instructions retired; unhalted core cycles; L2 and L3 cache hits and misses; vector and scalar floating-point operations, etc.). We then use the PMU data to calculate a new metric of efficiency in order to quantify processor resource utilization and make comparisons of that utilization between single-threading (ST) and HT modes. We also study performance gain using unhalted core cycles, code efficiency of using vector units of the processor, and the impact of HT mode on various shared resources like L2 and L3 cache. Results using four full-scale, production-quality scientific applications from computational fluid dynamics (CFD) used by NASA scientists indicate that HT generally improves processor resource utilization efficiency, but does not necessarily translate into overall application performance gain.","PeriodicalId":122468,"journal":{"name":"2011 18th International Conference on High Performance Computing","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"48","resultStr":"{\"title\":\"The impact of hyper-threading on processor resource utilization in production applications\",\"authors\":\"S. Saini, Haoqiang Jin, R. Hood, David Barker, P. Mehrotra, R. Biswas\",\"doi\":\"10.1109/HiPC.2011.6152743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Intel provides Hyper-Threading (HT) in processors based on its Pentium and Nehalem micro-architecture such as the Westmere-EP. HT enables two threads to execute on each core in order to hide latencies related to data access. These two threads can execute simultaneously, filling unused stages in the functional unit pipelines. To aid better understanding of HT-related issues, we collect Performance Monitoring Unit (PMU) data (instructions retired; unhalted core cycles; L2 and L3 cache hits and misses; vector and scalar floating-point operations, etc.). We then use the PMU data to calculate a new metric of efficiency in order to quantify processor resource utilization and make comparisons of that utilization between single-threading (ST) and HT modes. We also study performance gain using unhalted core cycles, code efficiency of using vector units of the processor, and the impact of HT mode on various shared resources like L2 and L3 cache. Results using four full-scale, production-quality scientific applications from computational fluid dynamics (CFD) used by NASA scientists indicate that HT generally improves processor resource utilization efficiency, but does not necessarily translate into overall application performance gain.\",\"PeriodicalId\":122468,\"journal\":{\"name\":\"2011 18th International Conference on High Performance Computing\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"48\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 18th International Conference on High Performance Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HiPC.2011.6152743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2011.6152743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 48

摘要

英特尔在基于奔腾和Nehalem微架构(如Westmere-EP)的处理器中提供超线程(HT)功能。HT允许在每个核心上执行两个线程，以隐藏与数据访问相关的延迟。这两个线程可以同时执行，填充功能单元管道中未使用的阶段。为了帮助更好地理解ht相关问题，我们收集了性能监控单元(PMU)数据(退役指令;不间断的核心循环;L2和L3缓存命中和未命中;向量和标量浮点运算等)。然后，我们使用PMU数据来计算新的效率指标，以便量化处理器资源利用率，并比较单线程(ST)和HT模式之间的利用率。我们还研究了使用不间断核心周期的性能增益，使用处理器矢量单元的代码效率，以及HT模式对各种共享资源(如L2和L3缓存)的影响。NASA科学家使用计算流体动力学(CFD)的四个全尺寸、生产质量的科学应用程序，结果表明，HT通常可以提高处理器资源利用率，但不一定转化为整体应用程序性能的提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The impact of hyper-threading on processor resource utilization in production applications

Intel provides Hyper-Threading (HT) in processors based on its Pentium and Nehalem micro-architecture such as the Westmere-EP. HT enables two threads to execute on each core in order to hide latencies related to data access. These two threads can execute simultaneously, filling unused stages in the functional unit pipelines. To aid better understanding of HT-related issues, we collect Performance Monitoring Unit (PMU) data (instructions retired; unhalted core cycles; L2 and L3 cache hits and misses; vector and scalar floating-point operations, etc.). We then use the PMU data to calculate a new metric of efficiency in order to quantify processor resource utilization and make comparisons of that utilization between single-threading (ST) and HT modes. We also study performance gain using unhalted core cycles, code efficiency of using vector units of the processor, and the impact of HT mode on various shared resources like L2 and L3 cache. Results using four full-scale, production-quality scientific applications from computational fluid dynamics (CFD) used by NASA scientists indicate that HT generally improves processor resource utilization efficiency, but does not necessarily translate into overall application performance gain.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 18th International Conference on High Performance Computing

自引率

0.00%

发文量