Nathan C Frey, Dan Zhao, Simon Axelrod, Michael Jones, David Bestor, V. Gadepally, Rafael Gómez-Bombarelli, S. Samsi
{"title":"能量感知神经结构选择与超参数优化","authors":"Nathan C Frey, Dan Zhao, Simon Axelrod, Michael Jones, David Bestor, V. Gadepally, Rafael Gómez-Bombarelli, S. Samsi","doi":"10.1109/IPDPSW55747.2022.00125","DOIUrl":null,"url":null,"abstract":"Artificial Intelligence (AI) and Deep Learning in particular have increasing computational requirements, with a corresponding increase in energy consumption. There is a tremendous opportunity to reduce the computational cost and environmental impact of deep learning by accelerating neural network architecture search and hyperparameter optimization, as well as explicitly designing neural architectures that optimize for both energy efficiency and performance. Here, we introduce a framework called training performance estimation (TPE), which builds upon existing techniques for training speed estimation in order to monitor energy consumption and rank model performance-without training models to convergence-saving up to 90% of time and energy of the full training budget. We benchmark TPE in the computationally intensive, well-studied domain of computer vision and in the emerging field of graph neural networks for machine-learned inter-atomic potentials, an important domain for scientific discovery with heavy computational demands. We propose variants of early stopping that generalize this common regularization technique to account for energy costs and study the energy costs of deploying increasingly complex, knowledge-informed architectures for AI-accelerated molecular dynamics and image classification. Our work enables immediate, significant energy savings across the entire pipeline of model development and deployment and suggests new research directions for energy-aware, knowledge-informed model architecture development.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"26 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Energy-aware neural architecture selection and hyperparameter optimization\",\"authors\":\"Nathan C Frey, Dan Zhao, Simon Axelrod, Michael Jones, David Bestor, V. Gadepally, Rafael Gómez-Bombarelli, S. Samsi\",\"doi\":\"10.1109/IPDPSW55747.2022.00125\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial Intelligence (AI) and Deep Learning in particular have increasing computational requirements, with a corresponding increase in energy consumption. There is a tremendous opportunity to reduce the computational cost and environmental impact of deep learning by accelerating neural network architecture search and hyperparameter optimization, as well as explicitly designing neural architectures that optimize for both energy efficiency and performance. Here, we introduce a framework called training performance estimation (TPE), which builds upon existing techniques for training speed estimation in order to monitor energy consumption and rank model performance-without training models to convergence-saving up to 90% of time and energy of the full training budget. We benchmark TPE in the computationally intensive, well-studied domain of computer vision and in the emerging field of graph neural networks for machine-learned inter-atomic potentials, an important domain for scientific discovery with heavy computational demands. We propose variants of early stopping that generalize this common regularization technique to account for energy costs and study the energy costs of deploying increasingly complex, knowledge-informed architectures for AI-accelerated molecular dynamics and image classification. Our work enables immediate, significant energy savings across the entire pipeline of model development and deployment and suggests new research directions for energy-aware, knowledge-informed model architecture development.\",\"PeriodicalId\":286968,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"26 4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW55747.2022.00125\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Energy-aware neural architecture selection and hyperparameter optimization
Artificial Intelligence (AI) and Deep Learning in particular have increasing computational requirements, with a corresponding increase in energy consumption. There is a tremendous opportunity to reduce the computational cost and environmental impact of deep learning by accelerating neural network architecture search and hyperparameter optimization, as well as explicitly designing neural architectures that optimize for both energy efficiency and performance. Here, we introduce a framework called training performance estimation (TPE), which builds upon existing techniques for training speed estimation in order to monitor energy consumption and rank model performance-without training models to convergence-saving up to 90% of time and energy of the full training budget. We benchmark TPE in the computationally intensive, well-studied domain of computer vision and in the emerging field of graph neural networks for machine-learned inter-atomic potentials, an important domain for scientific discovery with heavy computational demands. We propose variants of early stopping that generalize this common regularization technique to account for energy costs and study the energy costs of deploying increasingly complex, knowledge-informed architectures for AI-accelerated molecular dynamics and image classification. Our work enables immediate, significant energy savings across the entire pipeline of model development and deployment and suggests new research directions for energy-aware, knowledge-informed model architecture development.