rapitie:包含共享内存的流水线处理系统的快速性能估计

2015 33rd IEEE International Conference on Computer Design (ICCD) Pub Date : 2015-10-18 DOI:10.1109/ICCD.2015.7357175

S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran

{"title":"rapitie:包含共享内存的流水线处理系统的快速性能估计","authors":"S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/ICCD.2015.7357175","DOIUrl":null,"url":null,"abstract":"A pipeline of processors can increase the throughput of streaming applications significantly. Communication between processors in such a system can occur via FIFOs, shared memory or both. The use of a cache for the shared memory can improve performance. To see the effect of differing cache configurations (size, line size and associativity) on performance, typical full system simulations for each differing cache configuration must be performed. Rapid estimation of performance is difficult due to the cache being accessed by many processors. In this paper, for the first time, we show a method to estimate the performance of a pipelined processor system in the presence of differing sizes of caches which connect to the main memory. By performing just a few full simulations for a few cache configurations, and by using these simulations to estimate the hits and misses for other configurations, and then by carefully annotating the times of traces by the estimated hits and misses, we are able to estimate the throughput of a pipelined system to within 90% of its actual value. The estimation time takes less than 10% of full simulation time. The estimated values have a fidelity of 0.97 on average (1 being perfectly correlated) with the actual values.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"RAPITIMATE: Rapid performance estimation of pipelined processing systems containing shared memory\",\"authors\":\"S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran\",\"doi\":\"10.1109/ICCD.2015.7357175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A pipeline of processors can increase the throughput of streaming applications significantly. Communication between processors in such a system can occur via FIFOs, shared memory or both. The use of a cache for the shared memory can improve performance. To see the effect of differing cache configurations (size, line size and associativity) on performance, typical full system simulations for each differing cache configuration must be performed. Rapid estimation of performance is difficult due to the cache being accessed by many processors. In this paper, for the first time, we show a method to estimate the performance of a pipelined processor system in the presence of differing sizes of caches which connect to the main memory. By performing just a few full simulations for a few cache configurations, and by using these simulations to estimate the hits and misses for other configurations, and then by carefully annotating the times of traces by the estimated hits and misses, we are able to estimate the throughput of a pipelined system to within 90% of its actual value. The estimation time takes less than 10% of full simulation time. The estimated values have a fidelity of 0.97 on average (1 being perfectly correlated) with the actual values.\",\"PeriodicalId\":129506,\"journal\":{\"name\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2015.7357175\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 33rd IEEE International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2015.7357175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

处理器的流水线可以显著提高流应用程序的吞吐量。在这样一个系统中，处理器之间的通信可以通过fifo、共享内存或两者同时发生。为共享内存使用缓存可以提高性能。要查看不同的缓存配置(大小、行大小和关联性)对性能的影响，必须对每种不同的缓存配置执行典型的全系统模拟。由于缓存被许多处理器访问，因此很难快速估计性能。在本文中，我们首次展示了一种在连接到主存储器的不同大小的缓存存在的情况下估计流水线处理器系统性能的方法。通过对一些缓存配置执行一些完整的模拟，并使用这些模拟来估计其他配置的命中和未命中，然后通过估计命中和未命中仔细注释跟踪时间，我们能够将流水线系统的吞吐量估计在其实际值的90%以内。估计时间小于全仿真时间的10%。估计值与实际值的保真度平均为0.97(1完全相关)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RAPITIMATE: Rapid performance estimation of pipelined processing systems containing shared memory

A pipeline of processors can increase the throughput of streaming applications significantly. Communication between processors in such a system can occur via FIFOs, shared memory or both. The use of a cache for the shared memory can improve performance. To see the effect of differing cache configurations (size, line size and associativity) on performance, typical full system simulations for each differing cache configuration must be performed. Rapid estimation of performance is difficult due to the cache being accessed by many processors. In this paper, for the first time, we show a method to estimate the performance of a pipelined processor system in the presence of differing sizes of caches which connect to the main memory. By performing just a few full simulations for a few cache configurations, and by using these simulations to estimate the hits and misses for other configurations, and then by carefully annotating the times of traces by the estimated hits and misses, we are able to estimate the throughput of a pipelined system to within 90% of its actual value. The estimation time takes less than 10% of full simulation time. The estimated values have a fidelity of 0.97 on average (1 being perfectly correlated) with the actual values.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 33rd IEEE International Conference on Computer Design (ICCD)

自引率

0.00%

发文量