Minjie Wu, Weiwei Wu, N. Tai, Hongyu Zhao, Jiawu Fan, N. Yuan
{"title":"Research on OpenMP model of the parallel programming technology for homogeneous multicore DSP","authors":"Minjie Wu, Weiwei Wu, N. Tai, Hongyu Zhao, Jiawu Fan, N. Yuan","doi":"10.1109/ICSESS.2014.6933715","DOIUrl":null,"url":null,"abstract":"As application complexity continues to grow, using multicore processors has been proved to be an effective methodology to meet the ever-increasing processing demand across the industry association. The Master/Slave model, the Data Flow model and the OpenMP model are the three dominant models for parallel programming. In this paper, the first two models are briefly discussed while the OpenMP model is focused. Some factors (e.g. the number of threads, the scheduling strategy, the load balance, etc.) that affecting the execution performance of OpenMP programs were also studied in this paper. This paper presents a method of taking advantage of the OpenMP model to realize the image edge detection within the platform of TMS320C6678 DSP. The experimental results show that the OpenMP model has a better advantage on scalability and flexibility compared to the Master/Slave model and the Data Flow model. The best performance can be obtained when the number of threads is equal to the number of cores which are available within the platform. Under the circumstance of using the eight cores of TMS320C6678 DSP simultaneously, an image of 1024×768 pixels just needs 6.192ms to complete the edge detection. This result is impressive compared to the Master/Slave model's which saves 32.10% in time. Further more, if we use 1 to 8 cores, the respective execution time reduces resulting in the speedup approximately conforms to the Gustafson's law. In the case of 8 cores, the speedup reaches 7.233.","PeriodicalId":6473,"journal":{"name":"2014 IEEE 5th International Conference on Software Engineering and Service Science","volume":"237 1","pages":"921-924"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 5th International Conference on Software Engineering and Service Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2014.6933715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
As application complexity continues to grow, using multicore processors has been proved to be an effective methodology to meet the ever-increasing processing demand across the industry association. The Master/Slave model, the Data Flow model and the OpenMP model are the three dominant models for parallel programming. In this paper, the first two models are briefly discussed while the OpenMP model is focused. Some factors (e.g. the number of threads, the scheduling strategy, the load balance, etc.) that affecting the execution performance of OpenMP programs were also studied in this paper. This paper presents a method of taking advantage of the OpenMP model to realize the image edge detection within the platform of TMS320C6678 DSP. The experimental results show that the OpenMP model has a better advantage on scalability and flexibility compared to the Master/Slave model and the Data Flow model. The best performance can be obtained when the number of threads is equal to the number of cores which are available within the platform. Under the circumstance of using the eight cores of TMS320C6678 DSP simultaneously, an image of 1024×768 pixels just needs 6.192ms to complete the edge detection. This result is impressive compared to the Master/Slave model's which saves 32.10% in time. Further more, if we use 1 to 8 cores, the respective execution time reduces resulting in the speedup approximately conforms to the Gustafson's law. In the case of 8 cores, the speedup reaches 7.233.