A. Nguyen, P. Bose, K. Ekanadham, Ashwini K. Nanda, Maged M. Michael
{"title":"并行轨迹驱动建筑仿真的精度和加速","authors":"A. Nguyen, P. Bose, K. Ekanadham, Ashwini K. Nanda, Maged M. Michael","doi":"10.1109/IPPS.1997.580842","DOIUrl":null,"url":null,"abstract":"Trace-driven simulation continues to be one of the main evaluation methods in the design of high performance processor-memory sub-systems. In this paper, we examine the varying speed-up opportunities available by processing a given trace in parallel on an IBM SP-2 machine. We also develop a simple, yet effective method of correcting for cold-start cache miss errors, by the use of overlapped trace chunks. We then report selected experimental results to validate our expectations. We show that it is possible to achieve near-perfect speedup without loss of accuracy. Next, in order to achieve further reduction in simulation cost, we combine uniform sampling methods with parallel trace processing with a slight loss of accuracy for finite-cache timer runs. We then show that by using warm-start sequences from preceding trace chunks, it is possible to reduce the errors back to acceptable bounds.","PeriodicalId":145892,"journal":{"name":"Proceedings 11th International Parallel Processing Symposium","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"Accuracy and speed-up of parallel trace-driven architectural simulation\",\"authors\":\"A. Nguyen, P. Bose, K. Ekanadham, Ashwini K. Nanda, Maged M. Michael\",\"doi\":\"10.1109/IPPS.1997.580842\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Trace-driven simulation continues to be one of the main evaluation methods in the design of high performance processor-memory sub-systems. In this paper, we examine the varying speed-up opportunities available by processing a given trace in parallel on an IBM SP-2 machine. We also develop a simple, yet effective method of correcting for cold-start cache miss errors, by the use of overlapped trace chunks. We then report selected experimental results to validate our expectations. We show that it is possible to achieve near-perfect speedup without loss of accuracy. Next, in order to achieve further reduction in simulation cost, we combine uniform sampling methods with parallel trace processing with a slight loss of accuracy for finite-cache timer runs. We then show that by using warm-start sequences from preceding trace chunks, it is possible to reduce the errors back to acceptable bounds.\",\"PeriodicalId\":145892,\"journal\":{\"name\":\"Proceedings 11th International Parallel Processing Symposium\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 11th International Parallel Processing Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPPS.1997.580842\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 11th International Parallel Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPPS.1997.580842","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Accuracy and speed-up of parallel trace-driven architectural simulation
Trace-driven simulation continues to be one of the main evaluation methods in the design of high performance processor-memory sub-systems. In this paper, we examine the varying speed-up opportunities available by processing a given trace in parallel on an IBM SP-2 machine. We also develop a simple, yet effective method of correcting for cold-start cache miss errors, by the use of overlapped trace chunks. We then report selected experimental results to validate our expectations. We show that it is possible to achieve near-perfect speedup without loss of accuracy. Next, in order to achieve further reduction in simulation cost, we combine uniform sampling methods with parallel trace processing with a slight loss of accuracy for finite-cache timer runs. We then show that by using warm-start sequences from preceding trace chunks, it is possible to reduce the errors back to acceptable bounds.