通过缓存分区对嵌入式多处理器进行吞吐量优化

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation Pub Date : 2006-07-01 DOI:10.1109/ICSAMOS.2006.300826

A. Molnos, S. Cotofana, M. Heijligers, J. V. Eijndhoven

{"title":"通过缓存分区对嵌入式多处理器进行吞吐量优化","authors":"A. Molnos, S. Cotofana, M. Heijligers, J. V. Eijndhoven","doi":"10.1109/ICSAMOS.2006.300826","DOIUrl":null,"url":null,"abstract":"In embedded multiprocessors cache partitioning is a known technique to eliminate inter-task cache conflicts, so to increase predictability. On such systems, the partitioning ratio is a parameter that should be tuned to optimize performance. In this paper we propose a simulated annealing (SA) based heuristic to determine the cache partitioning ratio that maximizes an application's throughput. In its core, the SA method iterates many times over many partitioning ratios, checking the resulted throughput. Hence the throughput of the system has to be estimated very fast, so we utilize a light simulation strategy. The light simulation derives the throughput from tasks' timings gathered off-line. This is possible because in an environment where tasks don't interfere with each other, their performance figures can be used in any possible combination. An application of industrial relevance (H.264 decoder) running on a parallel homogeneous platform is used to demonstrate the proposed method. For the H.264 application 9% throughput improvement is achieved when compared to the throughput obtained using methods of partitioning for the least number of misses. This is a significant improvement as it represents 45% from the theoretical throughput improvement achievable when assuming an infinite cache","PeriodicalId":204190,"journal":{"name":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Throughput optimization via cache partitioning for embedded multiprocessors\",\"authors\":\"A. Molnos, S. Cotofana, M. Heijligers, J. V. Eijndhoven\",\"doi\":\"10.1109/ICSAMOS.2006.300826\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In embedded multiprocessors cache partitioning is a known technique to eliminate inter-task cache conflicts, so to increase predictability. On such systems, the partitioning ratio is a parameter that should be tuned to optimize performance. In this paper we propose a simulated annealing (SA) based heuristic to determine the cache partitioning ratio that maximizes an application's throughput. In its core, the SA method iterates many times over many partitioning ratios, checking the resulted throughput. Hence the throughput of the system has to be estimated very fast, so we utilize a light simulation strategy. The light simulation derives the throughput from tasks' timings gathered off-line. This is possible because in an environment where tasks don't interfere with each other, their performance figures can be used in any possible combination. An application of industrial relevance (H.264 decoder) running on a parallel homogeneous platform is used to demonstrate the proposed method. For the H.264 application 9% throughput improvement is achieved when compared to the throughput obtained using methods of partitioning for the least number of misses. This is a significant improvement as it represents 45% from the theoretical throughput improvement achievable when assuming an infinite cache\",\"PeriodicalId\":204190,\"journal\":{\"name\":\"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICSAMOS.2006.300826\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSAMOS.2006.300826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

在嵌入式多处理器中，缓存分区是消除任务间缓存冲突的一种已知技术，从而提高了可预测性。在这样的系统上，应该对分区比率进行调优以优化性能。在本文中，我们提出了一种基于模拟退火(SA)的启发式算法来确定最大化应用程序吞吐量的缓存分区比率。在其核心，SA方法在许多分区比率上迭代多次，检查结果吞吐量。因此，必须非常快速地估计系统的吞吐量，因此我们采用轻模拟策略。轻模拟从离线收集的任务时间中获得吞吐量。这是可能的，因为在任务不相互干扰的环境中，它们的性能数据可以以任何可能的组合使用。在并行同构平台上运行的一个工业相关应用(H.264解码器)验证了所提出的方法。对于H.264应用程序，与使用最小失误数分区方法获得的吞吐量相比，实现了9%的吞吐量改进。这是一个显著的改进，因为它代表了假设无限缓存时可实现的理论吞吐量改进的45%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Throughput optimization via cache partitioning for embedded multiprocessors

In embedded multiprocessors cache partitioning is a known technique to eliminate inter-task cache conflicts, so to increase predictability. On such systems, the partitioning ratio is a parameter that should be tuned to optimize performance. In this paper we propose a simulated annealing (SA) based heuristic to determine the cache partitioning ratio that maximizes an application's throughput. In its core, the SA method iterates many times over many partitioning ratios, checking the resulted throughput. Hence the throughput of the system has to be estimated very fast, so we utilize a light simulation strategy. The light simulation derives the throughput from tasks' timings gathered off-line. This is possible because in an environment where tasks don't interfere with each other, their performance figures can be used in any possible combination. An application of industrial relevance (H.264 decoder) running on a parallel homogeneous platform is used to demonstrate the proposed method. For the H.264 application 9% throughput improvement is achieved when compared to the throughput obtained using methods of partitioning for the least number of misses. This is a significant improvement as it represents 45% from the theoretical throughput improvement achievable when assuming an infinite cache

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation

自引率

0.00%

发文量