{"title":"大型共享内存计算机中减少同步开销外推方法的混合并行实现","authors":"Matthias Korch, T. Rauber, C. Scholtes","doi":"10.1109/ICPADS.2010.12","DOIUrl":null,"url":null,"abstract":"Extrapolation methods belong to the class of one-step methods for the solution of systems of ordinary differential equations (ODEs). In this paper, we present parallel implementation variants of extrapolation methods for large shared-memory computer systems which exploit pure data parallelism or mixed task and data parallelism and make use of different load balancing strategies and different loop structures. In addition to general implementation variants suitable for ODE systems with arbitrary access structure, we devise specialized implementation variants which exploit the specific access structure of a large class of ODE systems to reduce synchronization costs and to improve the locality of memory references. We analyze and compare the scalability and the locality behavior of the implementation variants on an SGI Altix 4700 using up to 500 threads.","PeriodicalId":365914,"journal":{"name":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Mixed-Parallel Implementations of Extrapolation Methods with Reduced Synchronization Overhead for Large Shared-Memory Computers\",\"authors\":\"Matthias Korch, T. Rauber, C. Scholtes\",\"doi\":\"10.1109/ICPADS.2010.12\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Extrapolation methods belong to the class of one-step methods for the solution of systems of ordinary differential equations (ODEs). In this paper, we present parallel implementation variants of extrapolation methods for large shared-memory computer systems which exploit pure data parallelism or mixed task and data parallelism and make use of different load balancing strategies and different loop structures. In addition to general implementation variants suitable for ODE systems with arbitrary access structure, we devise specialized implementation variants which exploit the specific access structure of a large class of ODE systems to reduce synchronization costs and to improve the locality of memory references. We analyze and compare the scalability and the locality behavior of the implementation variants on an SGI Altix 4700 using up to 500 threads.\",\"PeriodicalId\":365914,\"journal\":{\"name\":\"2010 IEEE 16th International Conference on Parallel and Distributed Systems\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 16th International Conference on Parallel and Distributed Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS.2010.12\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 16th International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2010.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mixed-Parallel Implementations of Extrapolation Methods with Reduced Synchronization Overhead for Large Shared-Memory Computers
Extrapolation methods belong to the class of one-step methods for the solution of systems of ordinary differential equations (ODEs). In this paper, we present parallel implementation variants of extrapolation methods for large shared-memory computer systems which exploit pure data parallelism or mixed task and data parallelism and make use of different load balancing strategies and different loop structures. In addition to general implementation variants suitable for ODE systems with arbitrary access structure, we devise specialized implementation variants which exploit the specific access structure of a large class of ODE systems to reduce synchronization costs and to improve the locality of memory references. We analyze and compare the scalability and the locality behavior of the implementation variants on an SGI Altix 4700 using up to 500 threads.