{"title":"Oxidation states in solids from data-driven paradigms","authors":"Yue Yin, Hai Xiao","doi":"10.1039/d5sc05694b","DOIUrl":null,"url":null,"abstract":"The oxidation state (OS) is an essential chemical concept that embodies chemical intuition but cannot be computed with well-defined physical laws. We establish a data-driven paradigm, with its implementation as Tsinghua Oxidation States in Solids (TOSS), to explicitly compute OSs in crystal structures as the emergent properties from large-sized datasets based on Bayesian maximum <em>a posteriori</em> probability (MAP). TOSS employs two looping structures over the large-sized dataset of crystal structures to obtain an emergent library of distance distributions as the foundation for chemically intuitive understanding and then determine the OSs by minimizing a loss function for each structure based on MAP and distance distributions in the whole dataset. We apply TOSS to a dataset of over one million crystal structures, achieving a superior success rate, and use the resulting OS dataset to train a graph convolutional network (GCN) model as an alternative. Both TOSS and the GCN model are benchmarked against a curated ICSD dataset of structures with human-assigned OSs, yielding high accuracies of 96.09% and 97.24%, respectively. We expect TOSS and the ML-model-based alternative to find a wide spectrum of applications, and this work also demonstrates an encouraging example for data-driven paradigms to explicitly compute the chemical intuition for tackling complex problems in chemistry.","PeriodicalId":9909,"journal":{"name":"Chemical Science","volume":"118 1","pages":""},"PeriodicalIF":7.4000,"publicationDate":"2025-09-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chemical Science","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1039/d5sc05694b","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0
Abstract
The oxidation state (OS) is an essential chemical concept that embodies chemical intuition but cannot be computed with well-defined physical laws. We establish a data-driven paradigm, with its implementation as Tsinghua Oxidation States in Solids (TOSS), to explicitly compute OSs in crystal structures as the emergent properties from large-sized datasets based on Bayesian maximum a posteriori probability (MAP). TOSS employs two looping structures over the large-sized dataset of crystal structures to obtain an emergent library of distance distributions as the foundation for chemically intuitive understanding and then determine the OSs by minimizing a loss function for each structure based on MAP and distance distributions in the whole dataset. We apply TOSS to a dataset of over one million crystal structures, achieving a superior success rate, and use the resulting OS dataset to train a graph convolutional network (GCN) model as an alternative. Both TOSS and the GCN model are benchmarked against a curated ICSD dataset of structures with human-assigned OSs, yielding high accuracies of 96.09% and 97.24%, respectively. We expect TOSS and the ML-model-based alternative to find a wide spectrum of applications, and this work also demonstrates an encouraging example for data-driven paradigms to explicitly compute the chemical intuition for tackling complex problems in chemistry.
期刊介绍:
Chemical Science is a journal that encompasses various disciplines within the chemical sciences. Its scope includes publishing ground-breaking research with significant implications for its respective field, as well as appealing to a wider audience in related areas. To be considered for publication, articles must showcase innovative and original advances in their field of study and be presented in a manner that is understandable to scientists from diverse backgrounds. However, the journal generally does not publish highly specialized research.