{"title":"Improvement of memory bandwidth utilization using OpenMP task with processor affinity","authors":"J. Arul, Chun Huang","doi":"10.1109/ISNE.2015.7131947","DOIUrl":null,"url":null,"abstract":"The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.","PeriodicalId":152001,"journal":{"name":"2015 International Symposium on Next-Generation Electronics (ISNE)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 International Symposium on Next-Generation Electronics (ISNE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISNE.2015.7131947","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
The CPU design has been evolving for more than 30 years since the first x86 microprocessor. Recently, instead of increasing the CPU performance, the focus has shifted to multi-core architecture. Multi-core processor technology is rapidly evolving, but the memory interface is a limiting factor in fulfilling the needs of multi-core and multi-threaded processors. This is a big challenge for software developers. The run time thread is dynamically allocated to each processor core by the scheduler of the operating system. Current parallel programming researches only aim to load balance and keep the multi-core running efficiently. As a result, applications may have poor spatial data locality. This will also cause uneven memory bandwidth usage due to differences in memory access paths. The question of obtaining maximum memory bandwidth utilization by controlling the thread of a processor affinity is the main scope of this particular research. Memory bandwidth utilization of 62% (8786.87 MB/s to 14201.88 MB/s) was achieved, if appropriate processor affinity was set for thread placement. The OpenMP task level parallelism in addition to processor affinity resulted in 69% (8786.87 MB/s to 14802.69 MB/s) of improvement using 2 threads. Thus, task level parallelism combined with processor affinity greatly increases the level of parallelism in an OpenMP parallel programming environment. As a result, it can improve the overall performance of parallel applications.