{"title":"Towards constructing application-level GPU computation states","authors":"Yulu Zhang, Xinyuan Guo, Hai Jiang, Kuan-Ching Li","doi":"10.1109/ICIS.2013.6607834","DOIUrl":null,"url":null,"abstract":"Computation state construction is an indispensable step to achieve fault tolerance and computation mobility for scientific applications by saving and restoring the state during program execution. However, there is no effective state construction scheme yet due to the GPU's batch-mode execution manner as the GPU takes on a larger role in high performance computing. The GPU's complex memory hierarchy means the states are scattered in different memory locations that are difficult to fetch. Programs that are running in parallel make the states difficult to construct for each thread. The paper proposes an application-level computation state construction scheme to support GPU programs. A precompiler and run-time support module are developed to construct and save states in the CPU system memory dynamically. Memory blocks are registered, and new data structures are proposed to save and restore the computation states represented by variables and pointers in the GPU. Secondary storage can be utilized for scalability and long-term fault tolerance.","PeriodicalId":345020,"journal":{"name":"2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE/ACIS 12th International Conference on Computer and Information Science (ICIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIS.2013.6607834","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Computation state construction is an indispensable step to achieve fault tolerance and computation mobility for scientific applications by saving and restoring the state during program execution. However, there is no effective state construction scheme yet due to the GPU's batch-mode execution manner as the GPU takes on a larger role in high performance computing. The GPU's complex memory hierarchy means the states are scattered in different memory locations that are difficult to fetch. Programs that are running in parallel make the states difficult to construct for each thread. The paper proposes an application-level computation state construction scheme to support GPU programs. A precompiler and run-time support module are developed to construct and save states in the CPU system memory dynamically. Memory blocks are registered, and new data structures are proposed to save and restore the computation states represented by variables and pointers in the GPU. Secondary storage can be utilized for scalability and long-term fault tolerance.