使用具有处理器亲缘性的OpenMP任务改进内存带宽利用率

J. Arul, Chun Huang
{"title":"使用具有处理器亲缘性的OpenMP任务改进内存带宽利用率","authors":"J. Arul, Chun Huang","doi":"10.1109/ISNE.2015.7131947","DOIUrl":null,"url":null,"abstract":"The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.","PeriodicalId":152001,"journal":{"name":"2015 International Symposium on Next-Generation Electronics (ISNE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Improvement of memory bandwidth utilization using OpenMP task with processor affinity\",\"authors\":\"J. Arul, Chun Huang\",\"doi\":\"10.1109/ISNE.2015.7131947\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.\",\"PeriodicalId\":152001,\"journal\":{\"name\":\"2015 International Symposium on Next-Generation Electronics (ISNE)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-05-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 International Symposium on Next-Generation Electronics (ISNE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISNE.2015.7131947\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Next-Generation Electronics (ISNE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNE.2015.7131947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

自第一个x86微处理器问世以来,CPU设计已经发展了30多年。最近,人们的关注点不再是提高CPU性能,而是转向了多核架构。多核处理器技术正在迅速发展,但内存接口是满足多核和多线程处理器需求的一个限制因素。这对软件开发人员来说是一个巨大的挑战。运行时线程由操作系统的调度程序动态地分配给每个处理器核心。当前的并行编程研究仅着眼于负载均衡和多核高效运行。因此,应用程序可能具有较差的空间数据局部性。由于内存访问路径的不同,这也会导致内存带宽使用不均匀。通过控制处理器关联线程获得最大内存带宽利用率的问题是本研究的主要范围。如果为线程放置设置了适当的处理器亲和性,内存带宽利用率可以达到62% (8786.87 MB/s到14201.88 MB/s)。OpenMP任务级并行性和处理器亲和性使得使用2个线程的性能提高了69%(从8786.87 MB/s到14802.69 MB/s)。因此,在OpenMP并行编程环境中,任务级并行性与处理器亲缘性相结合大大提高了并行性的水平。因此,它可以提高并行应用程序的整体性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Improvement of memory bandwidth utilization using OpenMP task with processor affinity
The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信