D. Cheriton, H. A. Goosen, H. Holbrook, P. Machanick
{"title":"重构并行模拟以改善共享内存多处理器中的缓存行为:分布式同步的价值","authors":"D. Cheriton, H. A. Goosen, H. Holbrook, P. Machanick","doi":"10.1145/158459.158480","DOIUrl":null,"url":null,"abstract":"Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior.\nThis paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.","PeriodicalId":194781,"journal":{"name":"Workshop on Parallel and Distributed Simulation","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1993-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":"{\"title\":\"Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: the value of distributed synchronization\",\"authors\":\"D. Cheriton, H. A. Goosen, H. Holbrook, P. Machanick\",\"doi\":\"10.1145/158459.158480\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior.\\nThis paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.\",\"PeriodicalId\":194781,\"journal\":{\"name\":\"Workshop on Parallel and Distributed Simulation\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1993-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"20\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Parallel and Distributed Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/158459.158480\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Parallel and Distributed Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/158459.158480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Restructuring a parallel simulation to improve cache behavior in a shared-memory multiprocessor: the value of distributed synchronization
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior.
This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.