Improvement of memory bandwidth utilization using OpenMP task with processor affinity

2015 International Symposium on Next-Generation Electronics (ISNE) Pub Date : 2015-05-04 DOI:10.1109/ISNE.2015.7131947

J. Arul, Chun Huang

{"title":"Improvement of memory bandwidth utilization using OpenMP task with processor affinity","authors":"J. Arul, Chun Huang","doi":"10.1109/ISNE.2015.7131947","DOIUrl":null,"url":null,"abstract":"The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.","PeriodicalId":152001,"journal":{"name":"2015 International Symposium on Next-Generation Electronics (ISNE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Next-Generation Electronics (ISNE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNE.2015.7131947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.

查看原文本刊更多论文

使用具有处理器亲缘性的OpenMP任务改进内存带宽利用率

自第一个x86微处理器问世以来，CPU设计已经发展了30多年。最近，人们的关注点不再是提高CPU性能，而是转向了多核架构。多核处理器技术正在迅速发展，但内存接口是满足多核和多线程处理器需求的一个限制因素。这对软件开发人员来说是一个巨大的挑战。运行时线程由操作系统的调度程序动态地分配给每个处理器核心。当前的并行编程研究仅着眼于负载均衡和多核高效运行。因此，应用程序可能具有较差的空间数据局部性。由于内存访问路径的不同，这也会导致内存带宽使用不均匀。通过控制处理器关联线程获得最大内存带宽利用率的问题是本研究的主要范围。如果为线程放置设置了适当的处理器亲和性，内存带宽利用率可以达到62% (8786.87 MB/s到14201.88 MB/s)。OpenMP任务级并行性和处理器亲和性使得使用2个线程的性能提高了69%(从8786.87 MB/s到14802.69 MB/s)。因此，在OpenMP并行编程环境中，任务级并行性与处理器亲缘性相结合大大提高了并行性的水平。因此，它可以提高并行应用程序的整体性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 International Symposium on Next-Generation Electronics (ISNE)

自引率

0.00%

发文量