同构多核DSP并行编程技术的OpenMP模型研究

Minjie Wu, Weiwei Wu, N. Tai, Hongyu Zhao, Jiawu Fan, N. Yuan
{"title":"同构多核DSP并行编程技术的OpenMP模型研究","authors":"Minjie Wu, Weiwei Wu, N. Tai, Hongyu Zhao, Jiawu Fan, N. Yuan","doi":"10.1109/ICSESS.2014.6933715","DOIUrl":null,"url":null,"abstract":"As application complexity continues to grow, using multicore processors has been proved to be an effective methodology to meet the ever-increasing processing demand across the industry association. The Master/Slave model, the Data Flow model and the OpenMP model are the three dominant models for parallel programming. In this paper, the first two models are briefly discussed while the OpenMP model is focused. Some factors (e.g. the number of threads, the scheduling strategy, the load balance, etc.) that affecting the execution performance of OpenMP programs were also studied in this paper. This paper presents a method of taking advantage of the OpenMP model to realize the image edge detection within the platform of TMS320C6678 DSP. The experimental results show that the OpenMP model has a better advantage on scalability and flexibility compared to the Master/Slave model and the Data Flow model. The best performance can be obtained when the number of threads is equal to the number of cores which are available within the platform. Under the circumstance of using the eight cores of TMS320C6678 DSP simultaneously, an image of 1024×768 pixels just needs 6.192ms to complete the edge detection. This result is impressive compared to the Master/Slave model's which saves 32.10% in time. Further more, if we use 1 to 8 cores, the respective execution time reduces resulting in the speedup approximately conforms to the Gustafson's law. In the case of 8 cores, the speedup reaches 7.233.","PeriodicalId":6473,"journal":{"name":"2014 IEEE 5th International Conference on Software Engineering and Service Science","volume":"237 1","pages":"921-924"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Research on OpenMP model of the parallel programming technology for homogeneous multicore DSP\",\"authors\":\"Minjie Wu, Weiwei Wu, N. Tai, Hongyu Zhao, Jiawu Fan, N. Yuan\",\"doi\":\"10.1109/ICSESS.2014.6933715\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As application complexity continues to grow, using multicore processors has been proved to be an effective methodology to meet the ever-increasing processing demand across the industry association. The Master/Slave model, the Data Flow model and the OpenMP model are the three dominant models for parallel programming. In this paper, the first two models are briefly discussed while the OpenMP model is focused. Some factors (e.g. the number of threads, the scheduling strategy, the load balance, etc.) that affecting the execution performance of OpenMP programs were also studied in this paper. This paper presents a method of taking advantage of the OpenMP model to realize the image edge detection within the platform of TMS320C6678 DSP. The experimental results show that the OpenMP model has a better advantage on scalability and flexibility compared to the Master/Slave model and the Data Flow model. The best performance can be obtained when the number of threads is equal to the number of cores which are available within the platform. Under the circumstance of using the eight cores of TMS320C6678 DSP simultaneously, an image of 1024×768 pixels just needs 6.192ms to complete the edge detection. This result is impressive compared to the Master/Slave model's which saves 32.10% in time. Further more, if we use 1 to 8 cores, the respective execution time reduces resulting in the speedup approximately conforms to the Gustafson's law. In the case of 8 cores, the speedup reaches 7.233.\",\"PeriodicalId\":6473,\"journal\":{\"name\":\"2014 IEEE 5th International Conference on Software Engineering and Service Science\",\"volume\":\"237 1\",\"pages\":\"921-924\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 5th International Conference on Software Engineering and Service Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSESS.2014.6933715\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 5th International Conference on Software Engineering and Service Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2014.6933715","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

随着应用程序复杂性的不断增长,使用多核处理器已被证明是一种有效的方法,可以满足整个行业协会不断增长的处理需求。主/从模型、数据流模型和OpenMP模型是并行编程的三种主要模型。本文简要讨论了前两种模型,重点介绍了OpenMP模型。本文还对影响OpenMP程序执行性能的一些因素(如线程数、调度策略、负载平衡等)进行了研究。本文提出了一种在TMS320C6678 DSP平台上利用OpenMP模型实现图像边缘检测的方法。实验结果表明,与主/从模型和数据流模型相比,OpenMP模型在可扩展性和灵活性方面具有更好的优势。当线程数量等于平台内可用的内核数量时,可以获得最佳性能。在同时使用TMS320C6678八核DSP的情况下,一张1024×768像素的图像只需要6.192ms就可以完成边缘检测。与主/从模型相比,这个结果令人印象深刻,后者节省了32.10%的时间。此外,如果我们使用1到8个内核,各自的执行时间减少,导致加速大约符合Gustafson定律。在8核的情况下,加速达到7.233。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Research on OpenMP model of the parallel programming technology for homogeneous multicore DSP
As application complexity continues to grow, using multicore processors has been proved to be an effective methodology to meet the ever-increasing processing demand across the industry association. The Master/Slave model, the Data Flow model and the OpenMP model are the three dominant models for parallel programming. In this paper, the first two models are briefly discussed while the OpenMP model is focused. Some factors (e.g. the number of threads, the scheduling strategy, the load balance, etc.) that affecting the execution performance of OpenMP programs were also studied in this paper. This paper presents a method of taking advantage of the OpenMP model to realize the image edge detection within the platform of TMS320C6678 DSP. The experimental results show that the OpenMP model has a better advantage on scalability and flexibility compared to the Master/Slave model and the Data Flow model. The best performance can be obtained when the number of threads is equal to the number of cores which are available within the platform. Under the circumstance of using the eight cores of TMS320C6678 DSP simultaneously, an image of 1024×768 pixels just needs 6.192ms to complete the edge detection. This result is impressive compared to the Master/Slave model's which saves 32.10% in time. Further more, if we use 1 to 8 cores, the respective execution time reduces resulting in the speedup approximately conforms to the Gustafson's law. In the case of 8 cores, the speedup reaches 7.233.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信