{"title":"分布式内存系统的异步工作窃取","authors":"Shigang Li, Jingyuan Hu, Xin Cheng, Chongchong Zhao","doi":"10.1109/PDP.2013.35","DOIUrl":null,"url":null,"abstract":"Work stealing is a popular policy for dynamic load balancing of irregular applications. However, communication overhead incurred by work stealing may make it less efficient, especially on distributed memory systems. In this work we propose an asynchronous work stealing (AsynchWS) strategy which exploits opportunities to overlap communication with local residual tasks. Profiling information is collected locally to optimize task granularity and guide the asynchronous work stealing. AsynchWS is implemented in Unified Parallel C (UPC), which effectively supports non-blocking one-sided communication and facilitates the implementation. Experiments are conducted on a 32 nodes Xeon X5650 cluster using a set of irregular applications. Results show that up to 16% better performance than the state-of-the-art strategies on distributed memory.","PeriodicalId":202977,"journal":{"name":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Asynchronous Work Stealing on Distributed Memory Systems\",\"authors\":\"Shigang Li, Jingyuan Hu, Xin Cheng, Chongchong Zhao\",\"doi\":\"10.1109/PDP.2013.35\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Work stealing is a popular policy for dynamic load balancing of irregular applications. However, communication overhead incurred by work stealing may make it less efficient, especially on distributed memory systems. In this work we propose an asynchronous work stealing (AsynchWS) strategy which exploits opportunities to overlap communication with local residual tasks. Profiling information is collected locally to optimize task granularity and guide the asynchronous work stealing. AsynchWS is implemented in Unified Parallel C (UPC), which effectively supports non-blocking one-sided communication and facilitates the implementation. Experiments are conducted on a 32 nodes Xeon X5650 cluster using a set of irregular applications. Results show that up to 16% better performance than the state-of-the-art strategies on distributed memory.\",\"PeriodicalId\":202977,\"journal\":{\"name\":\"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDP.2013.35\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDP.2013.35","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Asynchronous Work Stealing on Distributed Memory Systems
Work stealing is a popular policy for dynamic load balancing of irregular applications. However, communication overhead incurred by work stealing may make it less efficient, especially on distributed memory systems. In this work we propose an asynchronous work stealing (AsynchWS) strategy which exploits opportunities to overlap communication with local residual tasks. Profiling information is collected locally to optimize task granularity and guide the asynchronous work stealing. AsynchWS is implemented in Unified Parallel C (UPC), which effectively supports non-blocking one-sided communication and facilitates the implementation. Experiments are conducted on a 32 nodes Xeon X5650 cluster using a set of irregular applications. Results show that up to 16% better performance than the state-of-the-art strategies on distributed memory.