Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties

Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst
{"title":"Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties","authors":"Shokirbek Shermukhamedov, Dilorom Mamurjonova, Michael Probst","doi":"arxiv-2309.09355","DOIUrl":null,"url":null,"abstract":"The application of machine learning (ML) techniques in computational\nchemistry has led to significant advances in predicting molecular properties,\naccelerating drug discovery, and material design. ML models can extract hidden\npatterns and relationships from complex and large datasets, allowing for the\nprediction of various chemical properties with high accuracy. The use of such\nmethods has enabled the discovery of molecules and materials that were\npreviously difficult to identify. This paper introduces a new ML model based on\ndeep learning techniques, such as a multilayer encoder and decoder\narchitecture, for classification tasks. We demonstrate the opportunities\noffered by our approach by applying it to various types of input data,\nincluding organic and inorganic compounds. In particular, we developed and\ntested the model using the Matbench and Moleculenet benchmarks, which include\ncrystal properties and drug design-related benchmarks. We also conduct a\ncomprehensive analysis of vector representations of chemical compounds,\nshedding light on the underlying patterns in molecular data. The models used in\nthis work exhibit a high degree of predictive power, underscoring the progress\nthat can be made with refined machine learning when applied to molecular and\nmaterial datasets. For instance, on the Tox21 dataset, we achieved an average\naccuracy of 96%, surpassing the previous best result by 10%. Our code is\npublicly available at https://github.com/dmamur/elembert.","PeriodicalId":501259,"journal":{"name":"arXiv - PHYS - Atomic and Molecular Clusters","volume":"101 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Atomic and Molecular Clusters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2309.09355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The application of machine learning (ML) techniques in computational chemistry has led to significant advances in predicting molecular properties, accelerating drug discovery, and material design. ML models can extract hidden patterns and relationships from complex and large datasets, allowing for the prediction of various chemical properties with high accuracy. The use of such methods has enabled the discovery of molecules and materials that were previously difficult to identify. This paper introduces a new ML model based on deep learning techniques, such as a multilayer encoder and decoder architecture, for classification tasks. We demonstrate the opportunities offered by our approach by applying it to various types of input data, including organic and inorganic compounds. In particular, we developed and tested the model using the Matbench and Moleculenet benchmarks, which include crystal properties and drug design-related benchmarks. We also conduct a comprehensive analysis of vector representations of chemical compounds, shedding light on the underlying patterns in molecular data. The models used in this work exhibit a high degree of predictive power, underscoring the progress that can be made with refined machine learning when applied to molecular and material datasets. For instance, on the Tox21 dataset, we achieved an average accuracy of 96%, surpassing the previous best result by 10%. Our code is publicly available at https://github.com/dmamur/elembert.
从结构到性质:化学元素嵌入和用于准确预测化学性质的深度学习方法
机器学习(ML)技术在计算化学中的应用在预测分子性质、加速药物发现和材料设计方面取得了重大进展。ML模型可以从复杂的大型数据集中提取隐藏的模式和关系,从而可以高精度地预测各种化学性质。使用这些方法可以发现以前难以识别的分子和材料。本文介绍了一种新的基于深度学习技术的机器学习模型,如多层编码器和解码器架构,用于分类任务。通过将我们的方法应用于各种类型的输入数据,包括有机和无机化合物,我们展示了我们的方法所提供的机会。特别是,我们使用Matbench和Moleculenet基准开发和测试了该模型,其中包括晶体特性和药物设计相关基准。我们还对化合物的载体表示进行了全面的分析,揭示了分子数据中的潜在模式。在这项工作中使用的模型显示出高度的预测能力,强调了精细机器学习在应用于分子和材料数据集时可以取得的进展。例如,在Tox21数据集上,我们实现了96%的平均准确率,比之前的最佳结果高出10%。我们的代码可在https://github.com/dmamur/elembert上公开获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信