具有多干扰母线的优先级MPSoC总线的精确、快速跟踪感知性能估计模型

Q4 Engineering

IPSJ Transactions on System LSI Design Methodology Pub Date : 2017-01-01 DOI:10.2197/ipsjtsldm.10.13

F. Shafiq, T. Isshiki, Dongju Li

{"title":"具有多干扰母线的优先级MPSoC总线的精确、快速跟踪感知性能估计模型","authors":"F. Shafiq, T. Isshiki, Dongju Li","doi":"10.2197/ipsjtsldm.10.13","DOIUrl":null,"url":null,"abstract":"Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.","PeriodicalId":38964,"journal":{"name":"IPSJ Transactions on System LSI Design Methodology","volume":"82 1","pages":"13-27"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An Accurate and Fast Trace-aware Performance Estimation Model For Prioritized MPSoC Bus With Multiple Interfering Bus-Masters\",\"authors\":\"F. Shafiq, T. Isshiki, Dongju Li\",\"doi\":\"10.2197/ipsjtsldm.10.13\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.\",\"PeriodicalId\":38964,\"journal\":{\"name\":\"IPSJ Transactions on System LSI Design Methodology\",\"volume\":\"82 1\",\"pages\":\"13-27\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IPSJ Transactions on System LSI Design Methodology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2197/ipsjtsldm.10.13\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Engineering\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IPSJ Transactions on System LSI Design Methodology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2197/ipsjtsldm.10.13","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Engineering","Score":null,"Total":0}

引用次数: 0

摘要

由于多核体系结构的复杂性，现代和未来多核系统的准确和快速的性能评估方法是许多研究的焦点。这类系统的通信架构对整个系统的性能和功耗有很大的影响。架构师需要通过在设计的早期阶段使用性能评估技术来探索许多设计可能性，从而在设计周期的早期做出设计决策。而软件开发人员需要为目标体系结构开发和测试应用程序，并在设计周期中尽可能早地收集性能度量。全系统仿真技术提供准确的性能值，但非常耗时。静态分析技术速度很快，但不能捕获与共享资源争用和仲裁相关的动态行为。此外，合成流量模式已被用于分析通信体系结构，但这种模式不够真实。我们提出了一个基于统计的模型来预测总线仲裁对总线架构性能的动态成本。建议的模型使用实际应用程序的工作负载跟踪和基准测试来捕获实际应用程序流量行为。收集流量模式的统计数据并输入到分析模型中，该模型计算所考虑的通信体系结构的性能值。通过了解性能度量，设计人员可以避免通信体系结构的过度设计和设计不足。本文建立在先前开发的性能评估模型的基础上。先前的工作建模了单总线和突发总线传输，但是对于每个阻塞的总线请求，每次只考虑一个干扰总线主。提出的改进精度模型考虑了每个阻塞请求的多个干扰主节点，从而提高了估计精度，特别是对于流量密集型应用和许多PE架构。实验针对两种不同的架构进行，即通过共享总线连接的4个处理元件和通过共享总线连接的8个处理元件。结果表明，与之前开发的模型相比，在低流量应用中，稀疏和机器人的精度没有显著差异，但在流量密集应用中，精度有显著提高。在4PE架构上，FPPPP的最大估计误差从1.75%降低到0.6%，FFT的最大估计误差从13.91%降低到8.8%。在8PE架构上，FPPP基准的最大估计误差从11.8%降低到2.7%。此外，本文还报道了该方法的仿真速度优于仿真方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An Accurate and Fast Trace-aware Performance Estimation Model For Prioritized MPSoC Bus With Multiple Interfering Bus-Masters

Accurate and fast performance estimation methods for modern and future multi-core systems are the focal point of much research due to the complexity associated with such architectures. The communication architecture of such systems has a huge impact on the performance and power of the whole system. Architects need to explore many design possibilities by using performance estimation techniques at early stages of design to make design decisions earlier in the design cycle. While software developers need to develop and test applications for the target architecture and gather performance measurements as early in the design cycle as possible. Full system simulation techniques provide accurate performance values but are extremely time consuming. Static analysis techniques are fast but cannot capture the dynamic behavior associated with shared resource contention and arbitration. Moreover, synthetic traffic patterns have been used to analyze the communication architecture however, such patterns are not realistic enough. We propose a statistical based model to predict the dynamic cost of bus arbitration on the performance of a bus architecture. The proposed model uses workload trace of the actual applications and benchmarks to capture the real application traffic behavior. Statistics on the traffic patterns are collected and input to the analytical model which calculates performance values for the communication architecture under consideration. By knowing the performance measures, designers can avoid over and under-design of the communication architecture. This paper builds up on a previously developed performance estimation model. The previous work modeled single and burst bus-transfers, however only one interfering bus master at a time for each blocked bus request was considered. The proposed, improved accuracy model considers multiple interfering masters for each blocked request hence improving the estimation accuracy especially for traffic intensive applications and many PE architectures. Experiments are performed for two different architectures i.e., 4 processing elements connected via a shared bus and 8 processing elements connected via a shared bus. Results show no significant difference in accuracy compared to previously developed model, for low traffic applications SPARSE and ROBOT however notable accuracy improvement for traffic intensive applications. Maximum estimation error is reduced from 1.75% to 0.6% for FPPPP and from maximum 13.91% to 8.8% for FFT on the 4PE architecture. On the 8PE architecture, maximum estimation error is reduced from 11.8% to 2.7% for the FPPP benchmark. Moreover simulation speed-up for the proposed technique over simulation method is reported.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IPSJ Transactions on System LSI Design Methodology Engineering-Electrical and Electronic Engineering

CiteScore

1.20

自引率

0.00%

发文量