多核处理器上并行实时应用的配置

2022 IEEE 20th International Conference on Industrial Informatics (INDIN) Pub Date : 2022-07-25 DOI:10.1109/INDIN51773.2022.9976163

Mohammad Samadi Gharajeh, Tiago Carvalho, L. M. Pinho

{"title":"多核处理器上并行实时应用的配置","authors":"Mohammad Samadi Gharajeh, Tiago Carvalho, L. M. Pinho","doi":"10.1109/INDIN51773.2022.9976163","DOIUrl":null,"url":null,"abstract":"Parallel programming models (e.g., OpenMP) are more and more used to improve the performance of real-time applications in modern processors. Nevertheless, these processors have complex architectures, being very difficult to understand their timing behavior. The main challenge with most of existing works is that they apply static timing analysis for simpler models or measurement-based analysis using traditional platforms (e.g., single core) or considering only sequential algorithms. How to provide an efficient configuration for the allocation of the parallel program in the computing units of the processor is still an open challenge. This paper studies the problem of performing timing analysis on complex multi-core platforms, pointing out a methodology to understand the applications’ timing behavior, and guide the configuration of the platform. As an example, the paper uses an OpenMP-based program of the Heat benchmark on a NVIDIA Jetson AGX Xavier. The main objectives are to analyze the execution time of OpenMP tasks, specify the best configuration of OpenMP directives, identify critical tasks, and discuss the predictability of the system/application. A Linux perf based measurement tool, which has been extended by our team, is applied to measure each task across multiple executions in terms of total CPU cycles, the number of cache accesses, and the number of cache misses at different cache levels, including L1, L2 and L3. The evaluation process is performed using the measurement of the performance metrics by our tool to study the predictability of the system/application.","PeriodicalId":359190,"journal":{"name":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","volume":"80 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Configuration of Parallel Real-Time Applications on Multi-Core Processors\",\"authors\":\"Mohammad Samadi Gharajeh, Tiago Carvalho, L. M. Pinho\",\"doi\":\"10.1109/INDIN51773.2022.9976163\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parallel programming models (e.g., OpenMP) are more and more used to improve the performance of real-time applications in modern processors. Nevertheless, these processors have complex architectures, being very difficult to understand their timing behavior. The main challenge with most of existing works is that they apply static timing analysis for simpler models or measurement-based analysis using traditional platforms (e.g., single core) or considering only sequential algorithms. How to provide an efficient configuration for the allocation of the parallel program in the computing units of the processor is still an open challenge. This paper studies the problem of performing timing analysis on complex multi-core platforms, pointing out a methodology to understand the applications’ timing behavior, and guide the configuration of the platform. As an example, the paper uses an OpenMP-based program of the Heat benchmark on a NVIDIA Jetson AGX Xavier. The main objectives are to analyze the execution time of OpenMP tasks, specify the best configuration of OpenMP directives, identify critical tasks, and discuss the predictability of the system/application. A Linux perf based measurement tool, which has been extended by our team, is applied to measure each task across multiple executions in terms of total CPU cycles, the number of cache accesses, and the number of cache misses at different cache levels, including L1, L2 and L3. The evaluation process is performed using the measurement of the performance metrics by our tool to study the predictability of the system/application.\",\"PeriodicalId\":359190,\"journal\":{\"name\":\"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)\",\"volume\":\"80 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-07-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INDIN51773.2022.9976163\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 20th International Conference on Industrial Informatics (INDIN)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INDIN51773.2022.9976163","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

并行编程模型(例如，OpenMP)越来越多地用于提高现代处理器中实时应用程序的性能。然而，这些处理器具有复杂的体系结构，很难理解它们的计时行为。大多数现有工作的主要挑战是，它们将静态时序分析应用于更简单的模型或使用传统平台(例如，单核)的基于测量的分析，或者只考虑顺序算法。如何在处理器的计算单元中为并行程序的分配提供有效的配置仍然是一个悬而未决的挑战。本文研究了在复杂多核平台上进行时序分析的问题，提出了一种理解应用程序时序行为的方法，指导平台的配置。作为一个例子，本文在NVIDIA Jetson AGX Xavier上使用了一个基于openmp的Heat基准程序。主要目标是分析OpenMP任务的执行时间，指定OpenMP指令的最佳配置，确定关键任务，并讨论系统/应用程序的可预测性。我们的团队已经扩展了一个基于Linux性能的测量工具，它用于测量多个执行中的每个任务，包括总CPU周期、缓存访问次数和不同缓存级别(包括L1、L2和L3)的缓存丢失次数。评估过程使用我们的工具对性能指标进行测量，以研究系统/应用程序的可预测性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Configuration of Parallel Real-Time Applications on Multi-Core Processors

Parallel programming models (e.g., OpenMP) are more and more used to improve the performance of real-time applications in modern processors. Nevertheless, these processors have complex architectures, being very difficult to understand their timing behavior. The main challenge with most of existing works is that they apply static timing analysis for simpler models or measurement-based analysis using traditional platforms (e.g., single core) or considering only sequential algorithms. How to provide an efficient configuration for the allocation of the parallel program in the computing units of the processor is still an open challenge. This paper studies the problem of performing timing analysis on complex multi-core platforms, pointing out a methodology to understand the applications’ timing behavior, and guide the configuration of the platform. As an example, the paper uses an OpenMP-based program of the Heat benchmark on a NVIDIA Jetson AGX Xavier. The main objectives are to analyze the execution time of OpenMP tasks, specify the best configuration of OpenMP directives, identify critical tasks, and discuss the predictability of the system/application. A Linux perf based measurement tool, which has been extended by our team, is applied to measure each task across multiple executions in terms of total CPU cycles, the number of cache accesses, and the number of cache misses at different cache levels, including L1, L2 and L3. The evaluation process is performed using the measurement of the performance metrics by our tool to study the predictability of the system/application.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 20th International Conference on Industrial Informatics (INDIN)

自引率

0.00%

发文量