Wenxiang Yang, Zhigong Yang, Yongguo Zhou, F. Wang, Cheng Chen, Yueqing Wang
{"title":"A Comprehensive Analysis of User Job Data on a Petascale Supercomputer Dedicated to CFD","authors":"Wenxiang Yang, Zhigong Yang, Yongguo Zhou, F. Wang, Cheng Chen, Yueqing Wang","doi":"10.1109/ICCC47050.2019.9064094","DOIUrl":null,"url":null,"abstract":"High performance computing (HPC) systems play a crucial role in performing large-scale scientific applications and their efficiencies are imperative to be improved. This paper aims to comprehensively understand job characteristics and the factors that affect system efficiency and performance, which lays a solid foundation for proposing and evaluating job scheduling and resource management methods. To achieve this goal, we collect job data covering two years from a petascale HPC system that is dedicated to computational fluid dynamics (CFD) applications. Furthermore, a detailed analysis about failed jobs and waiting time is conducted based on the dataset. Our analysis excavates some important characteristics of submitted jobs, which can not only help system owners understand and master the situation about CFD applications in the system, but also provide good guidance and ideas for optimizing job scheduling and resource management algorithms.","PeriodicalId":6739,"journal":{"name":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","volume":"6 1","pages":"86-91"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 5th International Conference on Computer and Communications (ICCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCC47050.2019.9064094","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
High performance computing (HPC) systems play a crucial role in performing large-scale scientific applications and their efficiencies are imperative to be improved. This paper aims to comprehensively understand job characteristics and the factors that affect system efficiency and performance, which lays a solid foundation for proposing and evaluating job scheduling and resource management methods. To achieve this goal, we collect job data covering two years from a petascale HPC system that is dedicated to computational fluid dynamics (CFD) applications. Furthermore, a detailed analysis about failed jobs and waiting time is conducted based on the dataset. Our analysis excavates some important characteristics of submitted jobs, which can not only help system owners understand and master the situation about CFD applications in the system, but also provide good guidance and ideas for optimizing job scheduling and resource management algorithms.