{"title":"基于机器学习的高性能计算集群作业性能分析与预测","authors":"Zhengxiong Hou, Shuxin Zhao, Chao Yin, Yunlan Wang, Jianhua Gu, Xingshe Zhou","doi":"10.1109/PDCAT46702.2019.00053","DOIUrl":null,"url":null,"abstract":"There are a lot of middle-class or small-class high-performance computing clusters at universities and research institutes, etc. Large volumes of job logs have been accumulated after many years of operation. In this paper, on the basis of accumulated job logs on a high-performance computing cluster, we examine and analyze the job logs. Then, we study machine learning based performance analysis and prediction methods for parallel jobs. Various machine learning methods such as multivariate linear fitting, artificial neural network are used to build performance prediction models. We compare the errors of each model, and select the optimal prediction model for different users. The experimental results show that we can obtain reasonable prediction accuracy using the selected machine learning algorithms.","PeriodicalId":166126,"journal":{"name":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Machine Learning Based Performance Analysis and Prediction of Jobs on a HPC Cluster\",\"authors\":\"Zhengxiong Hou, Shuxin Zhao, Chao Yin, Yunlan Wang, Jianhua Gu, Xingshe Zhou\",\"doi\":\"10.1109/PDCAT46702.2019.00053\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"There are a lot of middle-class or small-class high-performance computing clusters at universities and research institutes, etc. Large volumes of job logs have been accumulated after many years of operation. In this paper, on the basis of accumulated job logs on a high-performance computing cluster, we examine and analyze the job logs. Then, we study machine learning based performance analysis and prediction methods for parallel jobs. Various machine learning methods such as multivariate linear fitting, artificial neural network are used to build performance prediction models. We compare the errors of each model, and select the optimal prediction model for different users. The experimental results show that we can obtain reasonable prediction accuracy using the selected machine learning algorithms.\",\"PeriodicalId\":166126,\"journal\":{\"name\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDCAT46702.2019.00053\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDCAT46702.2019.00053","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Learning Based Performance Analysis and Prediction of Jobs on a HPC Cluster
There are a lot of middle-class or small-class high-performance computing clusters at universities and research institutes, etc. Large volumes of job logs have been accumulated after many years of operation. In this paper, on the basis of accumulated job logs on a high-performance computing cluster, we examine and analyze the job logs. Then, we study machine learning based performance analysis and prediction methods for parallel jobs. Various machine learning methods such as multivariate linear fitting, artificial neural network are used to build performance prediction models. We compare the errors of each model, and select the optimal prediction model for different users. The experimental results show that we can obtain reasonable prediction accuracy using the selected machine learning algorithms.