Revina Awalia Putri, Idris Winamo, Wiratmoko Yuwono, Agus Priyo Utomo
{"title":"Implementation of Resilience as a Service for Parallel Computing","authors":"Revina Awalia Putri, Idris Winamo, Wiratmoko Yuwono, Agus Priyo Utomo","doi":"10.1109/IES50839.2020.9231708","DOIUrl":null,"url":null,"abstract":"Interest in parallel computing has been increasing since the introduction of multi-core processor at a reasonable price for the common people, moreover parallel computing has the advantage of faster processing time compared to serial computing. However, the use of parallel computing which can shorten the time does not make it free from the risk of failure which causes the parallel computing process to stop so that the time needed to complete the process will increase depending on how long it takes to detect the failure (realize that the process is stalled) and repair the system to run the process again, also the process time that has been spent because of repeating the process from the beginning. To deal with this, a new resilience system is implemented for parallel computing with the adoption of RaaS (Resilience as a Service). In RaaS-implemented system, checkpoint is done periodically and RaaS run monitoring in order to detect failures. If failure occurs, RaaS automatically run recovery mechanism by replacing the failed instance and resuming the process from the most recent checkpoint. The experiment shows that RaaS implementation can be done for parallel computing and there is faster processing time to handle failure. With the implementation of RaaS, the time to detect failure is shorter and there is no need to repeat the process from the beginning again.","PeriodicalId":344685,"journal":{"name":"2020 International Electronics Symposium (IES)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Electronics Symposium (IES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IES50839.2020.9231708","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Interest in parallel computing has been increasing since the introduction of multi-core processor at a reasonable price for the common people, moreover parallel computing has the advantage of faster processing time compared to serial computing. However, the use of parallel computing which can shorten the time does not make it free from the risk of failure which causes the parallel computing process to stop so that the time needed to complete the process will increase depending on how long it takes to detect the failure (realize that the process is stalled) and repair the system to run the process again, also the process time that has been spent because of repeating the process from the beginning. To deal with this, a new resilience system is implemented for parallel computing with the adoption of RaaS (Resilience as a Service). In RaaS-implemented system, checkpoint is done periodically and RaaS run monitoring in order to detect failures. If failure occurs, RaaS automatically run recovery mechanism by replacing the failed instance and resuming the process from the most recent checkpoint. The experiment shows that RaaS implementation can be done for parallel computing and there is faster processing time to handle failure. With the implementation of RaaS, the time to detect failure is shorter and there is no need to repeat the process from the beginning again.