{"title":"GPU上动态规划的流水线实现","authors":"Makoto Miyazaki, Susumu Matsumae","doi":"10.1109/CANDARW.2018.00063","DOIUrl":null,"url":null,"abstract":"In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we parallelize a typical DP program where each element of its solution table is calculated in order by semigroup computations among some already computed elements in the table. We implement the DP program on GPU in a pipeline fashion, i.e., we use GPU cores for supporting pipeline-stages so that many elements of the solution table are partially computed in parallel at one time. Our implementation can determine one output value per one computational step, which is faster than the standard parallel implementation whose strategy is to speed up each semi-group computations. We evaluate the performance of our implementation and verify its speedup.","PeriodicalId":329439,"journal":{"name":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A Pipeline Implementation for Dynamic Programming on GPU\",\"authors\":\"Makoto Miyazaki, Susumu Matsumae\",\"doi\":\"10.1109/CANDARW.2018.00063\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we parallelize a typical DP program where each element of its solution table is calculated in order by semigroup computations among some already computed elements in the table. We implement the DP program on GPU in a pipeline fashion, i.e., we use GPU cores for supporting pipeline-stages so that many elements of the solution table are partially computed in parallel at one time. Our implementation can determine one output value per one computational step, which is faster than the standard parallel implementation whose strategy is to speed up each semi-group computations. We evaluate the performance of our implementation and verify its speedup.\",\"PeriodicalId\":329439,\"journal\":{\"name\":\"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)\",\"volume\":\"39 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CANDARW.2018.00063\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 Sixth International Symposium on Computing and Networking Workshops (CANDARW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CANDARW.2018.00063","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Pipeline Implementation for Dynamic Programming on GPU
In this paper, we show the effectiveness of a pipeline implementation of Dynamic Programming (DP) on GPU. As an example, we parallelize a typical DP program where each element of its solution table is calculated in order by semigroup computations among some already computed elements in the table. We implement the DP program on GPU in a pipeline fashion, i.e., we use GPU cores for supporting pipeline-stages so that many elements of the solution table are partially computed in parallel at one time. Our implementation can determine one output value per one computational step, which is faster than the standard parallel implementation whose strategy is to speed up each semi-group computations. We evaluate the performance of our implementation and verify its speedup.