{"title":"A study of machine-learning-derived formulas using artificially generated dataset","authors":"Donggeon Lee, Sooran Kim","doi":"10.1007/s40042-024-01103-w","DOIUrl":null,"url":null,"abstract":"<div><p>In this study, we investigate the effectiveness of machine learning (ML) models in constructing empirical formulas for the superconducting transition temperature (<i>T</i><sub>c</sub>) by comparing ML-derived equations with McMillan’s equation. We utilized artificially generated data with a size of 10,000 from McMillan’s equation and employed the parametric brute force searching (BFS) algorithm to search for model equations varying model complexity and dataset size. The BFS models with features of the Debye temperature and electron–phonon coupling exhibit the RMSE of 0.830 K and <i>R</i><sup>2</sup> of 0.976 even with a small dataset size of 100. The ML-derived formula is also close to McMillan’s equation showing a linear relationship between the Debye temperature and <i>T</i><sub>c</sub>, as well as a cubic relationship between electron–phonon coupling and <i>T</i><sub>c</sub>. Furthermore, we analyzed feature contributions using non-parametric random forest (RF) regression and found the strong relevance of electron–phonon coupling on <i>T</i><sub>c</sub>. Our results demonstrate the importance of feature selection and model complexity in effectively predicting <i>T</i><sub>c</sub> rather than simply adding more data.</p></div>","PeriodicalId":677,"journal":{"name":"Journal of the Korean Physical Society","volume":null,"pages":null},"PeriodicalIF":0.8000,"publicationDate":"2024-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the Korean Physical Society","FirstCategoryId":"101","ListUrlMain":"https://link.springer.com/article/10.1007/s40042-024-01103-w","RegionNum":4,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"PHYSICS, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
In this study, we investigate the effectiveness of machine learning (ML) models in constructing empirical formulas for the superconducting transition temperature (Tc) by comparing ML-derived equations with McMillan’s equation. We utilized artificially generated data with a size of 10,000 from McMillan’s equation and employed the parametric brute force searching (BFS) algorithm to search for model equations varying model complexity and dataset size. The BFS models with features of the Debye temperature and electron–phonon coupling exhibit the RMSE of 0.830 K and R2 of 0.976 even with a small dataset size of 100. The ML-derived formula is also close to McMillan’s equation showing a linear relationship between the Debye temperature and Tc, as well as a cubic relationship between electron–phonon coupling and Tc. Furthermore, we analyzed feature contributions using non-parametric random forest (RF) regression and found the strong relevance of electron–phonon coupling on Tc. Our results demonstrate the importance of feature selection and model complexity in effectively predicting Tc rather than simply adding more data.
期刊介绍:
The Journal of the Korean Physical Society (JKPS) covers all fields of physics spanning from statistical physics and condensed matter physics to particle physics. The manuscript to be published in JKPS is required to hold the originality, significance, and recent completeness. The journal is composed of Full paper, Letters, and Brief sections. In addition, featured articles with outstanding results are selected by the Editorial board and introduced in the online version. For emphasis on aspect of international journal, several world-distinguished researchers join the Editorial board. High quality of papers may be express-published when it is recommended or requested.