表征和改进英特尔线程构建块的性能

2008 IEEE International Symposium on Workload Characterization Pub Date : 2008-09-30 DOI:10.1109/IISWC.2008.4636091

Gilberto Contreras, M. Martonosi

{"title":"表征和改进英特尔线程构建块的性能","authors":"Gilberto Contreras, M. Martonosi","doi":"10.1109/IISWC.2008.4636091","DOIUrl":null,"url":null,"abstract":"The Intel threading building blocks (TBB) runtime library is a popular C++ parallelization environment (D. Bolton, 2007) that offers a set of methods and templates for creating parallel applications. Through support of parallel tasks rather than parallel threads, the TBB runtime library offers improved performance scalability by dynamically redistributing parallel tasks across available processors. This not only creates more scalable, portable parallel applications, but also increases programming productivity by allowing programmers to focus their efforts on identifying concurrency rather than worrying about its management. While many applications benefit from dynamic management of parallelism, dynamic management carries parallelization overhead that increases with increasing core counts and decreasing task sizes. Understanding the sources of these overheads and their implications on application performance can help programmers make more efficient use of available parallelism. Clearly understanding the behavior of these overheads is the first step in creating efficient, scalable parallelization environments targeted at future CMP systems. In this paper we study and characterize some of the overheads of the Intel Threading Building Blocks through the use of real-hardware and simulation performance measurements. Our results show that synchronization overheads within TBB can have a significant and detrimental effect on parallelism performance. Random stealing, while simple and effective at low core counts, becomes less effective as application heterogeneity and core counts increase. Overall, our study provides valuable insights that can be used to create more robust, scalable runtime libraries.","PeriodicalId":447179,"journal":{"name":"2008 IEEE International Symposium on Workload Characterization","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"127","resultStr":"{\"title\":\"Characterizing and improving the performance of Intel Threading Building Blocks\",\"authors\":\"Gilberto Contreras, M. Martonosi\",\"doi\":\"10.1109/IISWC.2008.4636091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Intel threading building blocks (TBB) runtime library is a popular C++ parallelization environment (D. Bolton, 2007) that offers a set of methods and templates for creating parallel applications. Through support of parallel tasks rather than parallel threads, the TBB runtime library offers improved performance scalability by dynamically redistributing parallel tasks across available processors. This not only creates more scalable, portable parallel applications, but also increases programming productivity by allowing programmers to focus their efforts on identifying concurrency rather than worrying about its management. While many applications benefit from dynamic management of parallelism, dynamic management carries parallelization overhead that increases with increasing core counts and decreasing task sizes. Understanding the sources of these overheads and their implications on application performance can help programmers make more efficient use of available parallelism. Clearly understanding the behavior of these overheads is the first step in creating efficient, scalable parallelization environments targeted at future CMP systems. In this paper we study and characterize some of the overheads of the Intel Threading Building Blocks through the use of real-hardware and simulation performance measurements. Our results show that synchronization overheads within TBB can have a significant and detrimental effect on parallelism performance. Random stealing, while simple and effective at low core counts, becomes less effective as application heterogeneity and core counts increase. Overall, our study provides valuable insights that can be used to create more robust, scalable runtime libraries.\",\"PeriodicalId\":447179,\"journal\":{\"name\":\"2008 IEEE International Symposium on Workload Characterization\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2008-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"127\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2008 IEEE International Symposium on Workload Characterization\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IISWC.2008.4636091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Workload Characterization","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2008.4636091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 127

摘要

Intel线程构建块(TBB)运行库是一种流行的c++并行化环境(D. Bolton, 2007)，它为创建并行应用程序提供了一组方法和模板。通过支持并行任务而不是并行线程，TBB运行时库通过在可用处理器之间动态地重新分配并行任务，提供了改进的性能可伸缩性。这不仅可以创建更可伸缩、可移植的并行应用程序，而且还可以通过允许程序员将精力集中在识别并发性上而不必担心并发性的管理，从而提高编程效率。虽然许多应用程序受益于并行性的动态管理，但动态管理带来的并行化开销会随着核心数量的增加和任务大小的减小而增加。了解这些开销的来源及其对应用程序性能的影响可以帮助程序员更有效地利用可用的并行性。清楚地理解这些开销的行为是创建针对未来CMP系统的高效、可扩展并行化环境的第一步。在本文中，我们通过使用真实硬件和模拟性能测量来研究和表征英特尔线程构建块的一些开销。我们的结果表明，TBB中的同步开销可能会对并行性能产生重大而有害的影响。随机窃取虽然在低核数时简单有效，但随着应用程序的异构性和核数的增加，它的效率就会降低。总的来说，我们的研究提供了有价值的见解，可用于创建更健壮、可扩展的运行时库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Characterizing and improving the performance of Intel Threading Building Blocks

The Intel threading building blocks (TBB) runtime library is a popular C++ parallelization environment (D. Bolton, 2007) that offers a set of methods and templates for creating parallel applications. Through support of parallel tasks rather than parallel threads, the TBB runtime library offers improved performance scalability by dynamically redistributing parallel tasks across available processors. This not only creates more scalable, portable parallel applications, but also increases programming productivity by allowing programmers to focus their efforts on identifying concurrency rather than worrying about its management. While many applications benefit from dynamic management of parallelism, dynamic management carries parallelization overhead that increases with increasing core counts and decreasing task sizes. Understanding the sources of these overheads and their implications on application performance can help programmers make more efficient use of available parallelism. Clearly understanding the behavior of these overheads is the first step in creating efficient, scalable parallelization environments targeted at future CMP systems. In this paper we study and characterize some of the overheads of the Intel Threading Building Blocks through the use of real-hardware and simulation performance measurements. Our results show that synchronization overheads within TBB can have a significant and detrimental effect on parallelism performance. Random stealing, while simple and effective at low core counts, becomes less effective as application heterogeneity and core counts increase. Overall, our study provides valuable insights that can be used to create more robust, scalable runtime libraries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2008 IEEE International Symposium on Workload Characterization

自引率

0.00%

发文量