{"title":"高性能计算系统的冷感知作业调度和节点分配","authors":"Thang Cao, Wei Huang, Yuan He, Masaaki Kondo","doi":"10.1109/IPDPS.2017.19","DOIUrl":null,"url":null,"abstract":"Limited power budget is becoming one of the most crucial challenges in developing supercomputer systems. Hardware overprovisioning which installs a larger number of nodes beyond the limitations of the power constraint is an attractive way to design next generation supercomputers. In air cooled HPC centers, about half of the total power is consumed by cooling facilities. Reducing cooling power and effectively utilizing power resource for computing nodes are important challenges. It is known that the cooling power depends on the hotspot temperature of the node inlets. Therefore, if we minimize the hotspot temperature, performance efficiency of the HPC system will be increased. One of the ways to reduce the hotspot temperature is to allocate power-hungry jobs to compute nodes whose effect on the hotspot temperature is small. It can be accomplished by optimizing job-to-node mapping in the job scheduler. In this paper, we propose a cooling and node location-aware job scheduling strategy which tries to optimize job-to-node mapping while improving the total system throughput under the constraint of total system (compute nodes and cooling facilities) power consumption. Experimental results with the job scheduling simulation show that our scheduling scheme achieves 1.49X higher total system throughput than the conventional scheme.","PeriodicalId":209524,"journal":{"name":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":"{\"title\":\"Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems\",\"authors\":\"Thang Cao, Wei Huang, Yuan He, Masaaki Kondo\",\"doi\":\"10.1109/IPDPS.2017.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Limited power budget is becoming one of the most crucial challenges in developing supercomputer systems. Hardware overprovisioning which installs a larger number of nodes beyond the limitations of the power constraint is an attractive way to design next generation supercomputers. In air cooled HPC centers, about half of the total power is consumed by cooling facilities. Reducing cooling power and effectively utilizing power resource for computing nodes are important challenges. It is known that the cooling power depends on the hotspot temperature of the node inlets. Therefore, if we minimize the hotspot temperature, performance efficiency of the HPC system will be increased. One of the ways to reduce the hotspot temperature is to allocate power-hungry jobs to compute nodes whose effect on the hotspot temperature is small. It can be accomplished by optimizing job-to-node mapping in the job scheduler. In this paper, we propose a cooling and node location-aware job scheduling strategy which tries to optimize job-to-node mapping while improving the total system throughput under the constraint of total system (compute nodes and cooling facilities) power consumption. Experimental results with the job scheduling simulation show that our scheduling scheme achieves 1.49X higher total system throughput than the conventional scheme.\",\"PeriodicalId\":209524,\"journal\":{\"name\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"21\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPS.2017.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2017.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Cooling-Aware Job Scheduling and Node Allocation for Overprovisioned HPC Systems
Limited power budget is becoming one of the most crucial challenges in developing supercomputer systems. Hardware overprovisioning which installs a larger number of nodes beyond the limitations of the power constraint is an attractive way to design next generation supercomputers. In air cooled HPC centers, about half of the total power is consumed by cooling facilities. Reducing cooling power and effectively utilizing power resource for computing nodes are important challenges. It is known that the cooling power depends on the hotspot temperature of the node inlets. Therefore, if we minimize the hotspot temperature, performance efficiency of the HPC system will be increased. One of the ways to reduce the hotspot temperature is to allocate power-hungry jobs to compute nodes whose effect on the hotspot temperature is small. It can be accomplished by optimizing job-to-node mapping in the job scheduler. In this paper, we propose a cooling and node location-aware job scheduling strategy which tries to optimize job-to-node mapping while improving the total system throughput under the constraint of total system (compute nodes and cooling facilities) power consumption. Experimental results with the job scheduling simulation show that our scheduling scheme achieves 1.49X higher total system throughput than the conventional scheme.