{"title":"A representation of crystal systems and space groups based on the variance of atomic positions (VAP): Case of 2D materials","authors":"R. Botella","doi":"10.1016/j.solidstatesciences.2025.108061","DOIUrl":null,"url":null,"abstract":"<div><div>Material representation is an active topic of computational materials science. It is especially useful for crystal structure prediction as well as related properties such as formation energy and band structure. Material representations aim at encoding the material structure in a format that can be well understood by machine learning (ML) algorithms. Herein, a new material representation is proposed that is both compact and easily usable by ML algorithms. In the proposed representation, the connectivity and symmetry considerations of a material are made implicit, assimilating the unit cell of a material to a 3D cloud of points. Accordingly, one way to describe such a distribution of values is through the study of their variance. In the case of atomic positions, we obtain the variance of atomic positions (VAP). The VAP representations of 6176 2D materials from the 2DMatPedia database are computed and studied. After visual inspection, the VAP representations values obtained are sorted by crystal systems and space groups to be classified through ML. K-nearest neighbors (KNN) and random forest (RF) algorithms are trained and tested for crystal system and space group classification. Despite the visual inspection not showing VAP-crystal system or VAP-space group correlation, classification accuracies range from ca. 92 % to ca. 97 %, and from 96 % to 98 % for pairwise crystal system and pairwise space group classifications, respectively. Multi-classification accuracies are also high, ranging from 79 % to 88 % for crystal systems, and from 73 % to 82 % for space groups. To deepen the understanding of the classification process, accuracy maps are analyzed, uncovering a bias of the dataset. The influence of this bias on the classification accuracies is discussed.</div></div>","PeriodicalId":432,"journal":{"name":"Solid State Sciences","volume":"169 ","pages":"Article 108061"},"PeriodicalIF":3.3000,"publicationDate":"2025-09-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Solid State Sciences","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1293255825002390","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, INORGANIC & NUCLEAR","Score":null,"Total":0}
引用次数: 0
Abstract
Material representation is an active topic of computational materials science. It is especially useful for crystal structure prediction as well as related properties such as formation energy and band structure. Material representations aim at encoding the material structure in a format that can be well understood by machine learning (ML) algorithms. Herein, a new material representation is proposed that is both compact and easily usable by ML algorithms. In the proposed representation, the connectivity and symmetry considerations of a material are made implicit, assimilating the unit cell of a material to a 3D cloud of points. Accordingly, one way to describe such a distribution of values is through the study of their variance. In the case of atomic positions, we obtain the variance of atomic positions (VAP). The VAP representations of 6176 2D materials from the 2DMatPedia database are computed and studied. After visual inspection, the VAP representations values obtained are sorted by crystal systems and space groups to be classified through ML. K-nearest neighbors (KNN) and random forest (RF) algorithms are trained and tested for crystal system and space group classification. Despite the visual inspection not showing VAP-crystal system or VAP-space group correlation, classification accuracies range from ca. 92 % to ca. 97 %, and from 96 % to 98 % for pairwise crystal system and pairwise space group classifications, respectively. Multi-classification accuracies are also high, ranging from 79 % to 88 % for crystal systems, and from 73 % to 82 % for space groups. To deepen the understanding of the classification process, accuracy maps are analyzed, uncovering a bias of the dataset. The influence of this bias on the classification accuracies is discussed.
期刊介绍:
Solid State Sciences is the journal for researchers from the broad solid state chemistry and physics community. It publishes key articles on all aspects of solid state synthesis, structure-property relationships, theory and functionalities, in relation with experiments.
Key topics for stand-alone papers and special issues:
-Novel ways of synthesis, inorganic functional materials, including porous and glassy materials, hybrid organic-inorganic compounds and nanomaterials
-Physical properties, emphasizing but not limited to the electrical, magnetical and optical features
-Materials related to information technology and energy and environmental sciences.
The journal publishes feature articles from experts in the field upon invitation.
Solid State Sciences - your gateway to energy-related materials.