S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran
{"title":"rapitie:包含共享内存的流水线处理系统的快速性能估计","authors":"S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran","doi":"10.1109/ICCD.2015.7357175","DOIUrl":null,"url":null,"abstract":"A pipeline of processors can increase the throughput of streaming applications significantly. Communication between processors in such a system can occur via FIFOs, shared memory or both. The use of a cache for the shared memory can improve performance. To see the effect of differing cache configurations (size, line size and associativity) on performance, typical full system simulations for each differing cache configuration must be performed. Rapid estimation of performance is difficult due to the cache being accessed by many processors. In this paper, for the first time, we show a method to estimate the performance of a pipelined processor system in the presence of differing sizes of caches which connect to the main memory. By performing just a few full simulations for a few cache configurations, and by using these simulations to estimate the hits and misses for other configurations, and then by carefully annotating the times of traces by the estimated hits and misses, we are able to estimate the throughput of a pipelined system to within 90% of its actual value. The estimation time takes less than 10% of full simulation time. The estimated values have a fidelity of 0.97 on average (1 being perfectly correlated) with the actual values.","PeriodicalId":129506,"journal":{"name":"2015 33rd IEEE International Conference on Computer Design (ICCD)","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"RAPITIMATE: Rapid performance estimation of pipelined processing systems containing shared memory\",\"authors\":\"S. Min, Kapil Batra, Yusuke Yachide, Jorgen Peddersen, S. Parameswaran\",\"doi\":\"10.1109/ICCD.2015.7357175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A pipeline of processors can increase the throughput of streaming applications significantly. Communication between processors in such a system can occur via FIFOs, shared memory or both. The use of a cache for the shared memory can improve performance. To see the effect of differing cache configurations (size, line size and associativity) on performance, typical full system simulations for each differing cache configuration must be performed. Rapid estimation of performance is difficult due to the cache being accessed by many processors. In this paper, for the first time, we show a method to estimate the performance of a pipelined processor system in the presence of differing sizes of caches which connect to the main memory. By performing just a few full simulations for a few cache configurations, and by using these simulations to estimate the hits and misses for other configurations, and then by carefully annotating the times of traces by the estimated hits and misses, we are able to estimate the throughput of a pipelined system to within 90% of its actual value. The estimation time takes less than 10% of full simulation time. The estimated values have a fidelity of 0.97 on average (1 being perfectly correlated) with the actual values.\",\"PeriodicalId\":129506,\"journal\":{\"name\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"volume\":\"40 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-10-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 33rd IEEE International Conference on Computer Design (ICCD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCD.2015.7357175\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 33rd IEEE International Conference on Computer Design (ICCD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCD.2015.7357175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
RAPITIMATE: Rapid performance estimation of pipelined processing systems containing shared memory
A pipeline of processors can increase the throughput of streaming applications significantly. Communication between processors in such a system can occur via FIFOs, shared memory or both. The use of a cache for the shared memory can improve performance. To see the effect of differing cache configurations (size, line size and associativity) on performance, typical full system simulations for each differing cache configuration must be performed. Rapid estimation of performance is difficult due to the cache being accessed by many processors. In this paper, for the first time, we show a method to estimate the performance of a pipelined processor system in the presence of differing sizes of caches which connect to the main memory. By performing just a few full simulations for a few cache configurations, and by using these simulations to estimate the hits and misses for other configurations, and then by carefully annotating the times of traces by the estimated hits and misses, we are able to estimate the throughput of a pipelined system to within 90% of its actual value. The estimation time takes less than 10% of full simulation time. The estimated values have a fidelity of 0.97 on average (1 being perfectly correlated) with the actual values.