{"title":"非规则应用中DSM多核集群的自适应平衡计算和通信","authors":"S. C. Koduru, Keval Vora, Rajiv Gupta","doi":"10.1109/IPDPSW.2014.51","DOIUrl":null,"url":null,"abstract":"Graph-based applications have become increasingly important in many application domains. The large graph sizes offer data level parallelism at a scale that makes it attractive to run such applications on distributed shared memory (DSM) based modern clusters composed of multicore machines. Our analysis of several graph applications that rely on speculative parallelism or asynchronous parallelism shows that the balance between computation and communication differs between applications. In this paper, we study this balance in the context of DSMs and exploit the multiple cores present in modern multicore machines by creating three kinds of threads which allows us to dynamically balance computation and communication: compute threads to exploit data level parallelism in the computation, fetch threads that replicate data into object-stores before it is accessed by compute threads, and update threads that make results computed by compute threads visible to all compute threads by writing them to DSM. We observe that the best configuration for above mechanisms varies across different inputs in addition to the variation across different applications. To this end, we design ABC2: a runtime algorithm that automatically configures the DSM using simple runtime information such as: observed object prefetch and update queue lengths. This runtime algorithm achieves speedups close to that of the best hand-optimized configurations.","PeriodicalId":153864,"journal":{"name":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","volume":"220 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"ABC2: Adaptively Balancing Computation and Communication in a DSM Cluster of Multicores for Irregular Applications\",\"authors\":\"S. C. Koduru, Keval Vora, Rajiv Gupta\",\"doi\":\"10.1109/IPDPSW.2014.51\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Graph-based applications have become increasingly important in many application domains. The large graph sizes offer data level parallelism at a scale that makes it attractive to run such applications on distributed shared memory (DSM) based modern clusters composed of multicore machines. Our analysis of several graph applications that rely on speculative parallelism or asynchronous parallelism shows that the balance between computation and communication differs between applications. In this paper, we study this balance in the context of DSMs and exploit the multiple cores present in modern multicore machines by creating three kinds of threads which allows us to dynamically balance computation and communication: compute threads to exploit data level parallelism in the computation, fetch threads that replicate data into object-stores before it is accessed by compute threads, and update threads that make results computed by compute threads visible to all compute threads by writing them to DSM. We observe that the best configuration for above mechanisms varies across different inputs in addition to the variation across different applications. To this end, we design ABC2: a runtime algorithm that automatically configures the DSM using simple runtime information such as: observed object prefetch and update queue lengths. This runtime algorithm achieves speedups close to that of the best hand-optimized configurations.\",\"PeriodicalId\":153864,\"journal\":{\"name\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"volume\":\"220 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-05-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE International Parallel & Distributed Processing Symposium Workshops\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW.2014.51\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE International Parallel & Distributed Processing Symposium Workshops","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2014.51","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
ABC2: Adaptively Balancing Computation and Communication in a DSM Cluster of Multicores for Irregular Applications
Graph-based applications have become increasingly important in many application domains. The large graph sizes offer data level parallelism at a scale that makes it attractive to run such applications on distributed shared memory (DSM) based modern clusters composed of multicore machines. Our analysis of several graph applications that rely on speculative parallelism or asynchronous parallelism shows that the balance between computation and communication differs between applications. In this paper, we study this balance in the context of DSMs and exploit the multiple cores present in modern multicore machines by creating three kinds of threads which allows us to dynamically balance computation and communication: compute threads to exploit data level parallelism in the computation, fetch threads that replicate data into object-stores before it is accessed by compute threads, and update threads that make results computed by compute threads visible to all compute threads by writing them to DSM. We observe that the best configuration for above mechanisms varies across different inputs in addition to the variation across different applications. To this end, we design ABC2: a runtime algorithm that automatically configures the DSM using simple runtime information such as: observed object prefetch and update queue lengths. This runtime algorithm achieves speedups close to that of the best hand-optimized configurations.