Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst
{"title":"Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties","authors":"Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst","doi":"arxiv-2309.09355","DOIUrl":null,"url":null,"abstract":"The application of machine learning (ML) techniques in computational\nchemistry has led to significant advances in predicting molecular properties,\naccelerating drug discovery, and material design. ML models can extract hidden\npatterns and relationships from complex and large datasets, allowing for the\nprediction of various chemical properties with high accuracy. The use of such\nmethods has enabled the discovery of molecules and materials that were\npreviously difficult to identify. This paper introduces a new ML model based on\ndeep learning techniques, such as a multilayer encoder and decoder\narchitecture, for classification tasks. We demonstrate the opportunities\noffered by our approach by applying it to various types of input data,\nincluding organic and inorganic compounds. In particular, we developed and\ntested the model using the Matbench and Moleculenet benchmarks, which include\ncrystal properties and drug design-related benchmarks. We also conduct a\ncomprehensive analysis of vector representations of chemical compounds,\nshedding light on the underlying patterns in molecular data. The models used in\nthis work exhibit a high degree of predictive power, underscoring the progress\nthat can be made with refined machine learning when applied to molecular and\nmaterial datasets. For instance, on the Tox21 dataset, we achieved an average\naccuracy of 96%, surpassing the previous best result by 10%. Our code is\npublicly available at https://github.com/dmamur/elembert.","PeriodicalId":501259,"journal":{"name":"arXiv - PHYS - Atomic and Molecular Clusters","volume":"101 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Atomic and Molecular Clusters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2309.09355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The application of machine learning (ML) techniques in computational
chemistry has led to significant advances in predicting molecular properties,
accelerating drug discovery, and material design. ML models can extract hidden
patterns and relationships from complex and large datasets, allowing for the
prediction of various chemical properties with high accuracy. The use of such
methods has enabled the discovery of molecules and materials that were
previously difficult to identify. This paper introduces a new ML model based on
deep learning techniques, such as a multilayer encoder and decoder
architecture, for classification tasks. We demonstrate the opportunities
offered by our approach by applying it to various types of input data,
including organic and inorganic compounds. In particular, we developed and
tested the model using the Matbench and Moleculenet benchmarks, which include
crystal properties and drug design-related benchmarks. We also conduct a
comprehensive analysis of vector representations of chemical compounds,
shedding light on the underlying patterns in molecular data. The models used in
this work exhibit a high degree of predictive power, underscoring the progress
that can be made with refined machine learning when applied to molecular and
material datasets. For instance, on the Tox21 dataset, we achieved an average
accuracy of 96%, surpassing the previous best result by 10%. Our code is
publicly available at https://github.com/dmamur/elembert.