{"title":"在冰立方学习低损耗内存分配的科学工作流程","authors":"Carl Witt, J. Santen, U. Leser","doi":"10.1109/HPCS48598.2019.9188126","DOIUrl":null,"url":null,"abstract":"In scientific computing, scheduling tasks with heterogeneous resource requirements still requires users to estimate the resource usage of tasks. These estimates tend to be inaccurate in spite of laborious manual processes used to derive them. We show that machine learning outperforms user estimates, and models trained at runtime improve the resource allocation for workflows. We focus on allocating main memory in batch systems, which enforce resource limits by terminating jobs.The key idea is to train prediction models that minimize the costs resulting from prediction errors rather than minimizing prediction errors. In addition, we detect and exploit opportunities to predict resource usage of individual tasks based on their input size.We evaluated our approach on a 10 month production log from the IceCube South Pole Neutrino Observatory experiment. We compare our method to the performance of the current production system and a state-of-the-art method. We show that memory allocation quality can be increased from about 50% to 70%, while at the same time allowing users to provide only rough estimates of resource usage.","PeriodicalId":371856,"journal":{"name":"2019 International Conference on High Performance Computing & Simulation (HPCS)","volume":"158 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Learning Low-Wastage Memory Allocations for Scientific Workflows at IceCube\",\"authors\":\"Carl Witt, J. Santen, U. Leser\",\"doi\":\"10.1109/HPCS48598.2019.9188126\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In scientific computing, scheduling tasks with heterogeneous resource requirements still requires users to estimate the resource usage of tasks. These estimates tend to be inaccurate in spite of laborious manual processes used to derive them. We show that machine learning outperforms user estimates, and models trained at runtime improve the resource allocation for workflows. We focus on allocating main memory in batch systems, which enforce resource limits by terminating jobs.The key idea is to train prediction models that minimize the costs resulting from prediction errors rather than minimizing prediction errors. In addition, we detect and exploit opportunities to predict resource usage of individual tasks based on their input size.We evaluated our approach on a 10 month production log from the IceCube South Pole Neutrino Observatory experiment. We compare our method to the performance of the current production system and a state-of-the-art method. We show that memory allocation quality can be increased from about 50% to 70%, while at the same time allowing users to provide only rough estimates of resource usage.\",\"PeriodicalId\":371856,\"journal\":{\"name\":\"2019 International Conference on High Performance Computing & Simulation (HPCS)\",\"volume\":\"158 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 International Conference on High Performance Computing & Simulation (HPCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCS48598.2019.9188126\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 International Conference on High Performance Computing & Simulation (HPCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCS48598.2019.9188126","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Learning Low-Wastage Memory Allocations for Scientific Workflows at IceCube
In scientific computing, scheduling tasks with heterogeneous resource requirements still requires users to estimate the resource usage of tasks. These estimates tend to be inaccurate in spite of laborious manual processes used to derive them. We show that machine learning outperforms user estimates, and models trained at runtime improve the resource allocation for workflows. We focus on allocating main memory in batch systems, which enforce resource limits by terminating jobs.The key idea is to train prediction models that minimize the costs resulting from prediction errors rather than minimizing prediction errors. In addition, we detect and exploit opportunities to predict resource usage of individual tasks based on their input size.We evaluated our approach on a 10 month production log from the IceCube South Pole Neutrino Observatory experiment. We compare our method to the performance of the current production system and a state-of-the-art method. We show that memory allocation quality can be increased from about 50% to 70%, while at the same time allowing users to provide only rough estimates of resource usage.