{"title":"用于精确和快速的固体材料带隙预测的机器学习","authors":"Shomik Verma, S. Kajale, Rafael Gómez-Bombarelli","doi":"10.1109/HPEC55821.2022.9926355","DOIUrl":null,"url":null,"abstract":"Semi-Iocal DFT tends to vastly underestimate the bandgap of materials. Here we propose a machine learning calibration workflow to improve the accuracy of cheap DFT calculations. We first compile a dataset of 25k materials with PBE and HSE calculations completed. Using this dataset, we benchmark various machine learning architectures and features to determine which results in the highest accuracy. The best technique is able to improve the accuracy of PBE 10-fold. We then expand the generalizability of the model by utilizing active learning to intelligently sample chemical space. Because HSE data is not available for these new materials, we develop an optimized high-throughput parallelized workflow to calculate HSE bandgaps of lOk additional materials. We therefore develop a cheap, accurate, and generalized ML model for bandgap prediction.","PeriodicalId":200071,"journal":{"name":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine learning for accurate and fast bandgap prediction of solid-state materials\",\"authors\":\"Shomik Verma, S. Kajale, Rafael Gómez-Bombarelli\",\"doi\":\"10.1109/HPEC55821.2022.9926355\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Semi-Iocal DFT tends to vastly underestimate the bandgap of materials. Here we propose a machine learning calibration workflow to improve the accuracy of cheap DFT calculations. We first compile a dataset of 25k materials with PBE and HSE calculations completed. Using this dataset, we benchmark various machine learning architectures and features to determine which results in the highest accuracy. The best technique is able to improve the accuracy of PBE 10-fold. We then expand the generalizability of the model by utilizing active learning to intelligently sample chemical space. Because HSE data is not available for these new materials, we develop an optimized high-throughput parallelized workflow to calculate HSE bandgaps of lOk additional materials. We therefore develop a cheap, accurate, and generalized ML model for bandgap prediction.\",\"PeriodicalId\":200071,\"journal\":{\"name\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"volume\":\"110 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-09-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE High Performance Extreme Computing Conference (HPEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPEC55821.2022.9926355\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE High Performance Extreme Computing Conference (HPEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPEC55821.2022.9926355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine learning for accurate and fast bandgap prediction of solid-state materials
Semi-Iocal DFT tends to vastly underestimate the bandgap of materials. Here we propose a machine learning calibration workflow to improve the accuracy of cheap DFT calculations. We first compile a dataset of 25k materials with PBE and HSE calculations completed. Using this dataset, we benchmark various machine learning architectures and features to determine which results in the highest accuracy. The best technique is able to improve the accuracy of PBE 10-fold. We then expand the generalizability of the model by utilizing active learning to intelligently sample chemical space. Because HSE data is not available for these new materials, we develop an optimized high-throughput parallelized workflow to calculate HSE bandgaps of lOk additional materials. We therefore develop a cheap, accurate, and generalized ML model for bandgap prediction.