Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

arXiv - PHYS - Atomic and Molecular Clusters Pub Date : 2023-09-17 DOI:arxiv-2309.09355

Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst

{"title":"Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties","authors":"Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst","doi":"arxiv-2309.09355","DOIUrl":null,"url":null,"abstract":"The application of machine learning (ML) techniques in computational\nchemistry has led to significant advances in predicting molecular properties,\naccelerating drug discovery, and material design. ML models can extract hidden\npatterns and relationships from complex and large datasets, allowing for the\nprediction of various chemical properties with high accuracy. The use of such\nmethods has enabled the discovery of molecules and materials that were\npreviously difficult to identify. This paper introduces a new ML model based on\ndeep learning techniques, such as a multilayer encoder and decoder\narchitecture, for classification tasks. We demonstrate the opportunities\noffered by our approach by applying it to various types of input data,\nincluding organic and inorganic compounds. In particular, we developed and\ntested the model using the Matbench and Moleculenet benchmarks, which include\ncrystal properties and drug design-related benchmarks. We also conduct a\ncomprehensive analysis of vector representations of chemical compounds,\nshedding light on the underlying patterns in molecular data. The models used in\nthis work exhibit a high degree of predictive power, underscoring the progress\nthat can be made with refined machine learning when applied to molecular and\nmaterial datasets. For instance, on the Tox21 dataset, we achieved an average\naccuracy of 96%, surpassing the previous best result by 10%. Our code is\npublicly available at https://github.com/dmamur/elembert.","PeriodicalId":501259,"journal":{"name":"arXiv - PHYS - Atomic and Molecular Clusters","volume":"101 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Atomic and Molecular Clusters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2309.09355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The application of machine learning (ML) techniques in computational chemistry has led to significant advances in predicting molecular properties, accelerating drug discovery, and material design. ML models can extract hidden patterns and relationships from complex and large datasets, allowing for the prediction of various chemical properties with high accuracy. The use of such methods has enabled the discovery of molecules and materials that were previously difficult to identify. This paper introduces a new ML model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks. We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds. In particular, we developed and tested the model using the Matbench and Moleculenet benchmarks, which include crystal properties and drug design-related benchmarks. We also conduct a comprehensive analysis of vector representations of chemical compounds, shedding light on the underlying patterns in molecular data. The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning when applied to molecular and material datasets. For instance, on the Tox21 dataset, we achieved an average accuracy of 96%, surpassing the previous best result by 10%. Our code is publicly available at https://github.com/dmamur/elembert.

查看原文本刊更多论文

从结构到性质:化学元素嵌入和用于准确预测化学性质的深度学习方法

机器学习(ML)技术在计算化学中的应用在预测分子性质、加速药物发现和材料设计方面取得了重大进展。ML模型可以从复杂的大型数据集中提取隐藏的模式和关系，从而可以高精度地预测各种化学性质。使用这些方法可以发现以前难以识别的分子和材料。本文介绍了一种新的基于深度学习技术的机器学习模型，如多层编码器和解码器架构，用于分类任务。通过将我们的方法应用于各种类型的输入数据，包括有机和无机化合物，我们展示了我们的方法所提供的机会。特别是，我们使用Matbench和Moleculenet基准开发和测试了该模型，其中包括晶体特性和药物设计相关基准。我们还对化合物的载体表示进行了全面的分析，揭示了分子数据中的潜在模式。在这项工作中使用的模型显示出高度的预测能力，强调了精细机器学习在应用于分子和材料数据集时可以取得的进展。例如，在Tox21数据集上，我们实现了96%的平均准确率，比之前的最佳结果高出10%。我们的代码可在https://github.com/dmamur/elembert上公开获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - PHYS - Atomic and Molecular Clusters

自引率

0.00%

发文量