追求化学键理论的极限描述力和通用性:从原子间电子结构破译 "原子间基因组 "的方法

IF 5.7 1区 化学 Q2 CHEMISTRY, PHYSICAL
Xinxu Zhang, Jiahao Wei, Hui Jia, Jiamin Liu, Guo Li, Ling Liu, Yulong Wu, Changlong Liu, Xiao-Dong Zhang, Yonghui Li
{"title":"追求化学键理论的极限描述力和通用性:从原子间电子结构破译 \"原子间基因组 \"的方法","authors":"Xinxu Zhang, Jiahao Wei, Hui Jia, Jiamin Liu, Guo Li, Ling Liu, Yulong Wu, Changlong Liu, Xiao-Dong Zhang, Yonghui Li","doi":"10.1021/acs.jctc.4c00557","DOIUrl":null,"url":null,"abstract":"The description and analysis of chemical bonds have been difficult following the popularization of electronic structure calculations. Although many attempts have been made from the perspective of electronic structure, the sheer volume of information in the electronic structure has left contemporary chemical bond analysis methods grappling with an inescapable “Trilemma” where the model briefness, generality, and descriptiveness (descriptive power) cannot be obtained simultaneously. To push the generality and descriptiveness to their extremes, herein a general machine learning-based framework is introduced to compact chemical bonds into a detailed residue-by-residue “genome” with matched encoding/decoding tools. The framework fuses the quantum mechanical aspects, auto feature extraction, nanostructures and/or simulations, and generative models. The encoded genomes are information-dense and decodable, where 100% generality is guaranteed. The descriptiveness of genomes appears to be broader than most known models. As a proof of concept, the realization presented in this work compacts the complete information regarding two critical chemical bonds in thiolate-protected gold nanoclusters, the S–Au and Au–Au bonds, from a Bosonic-Fermionic character perspective into 8-valued genomes. The machine learning component is trained based on 26,528 density functional theory simulated electron localization function images. With an exploration of the space span for the genome, bond polarization, hybridization, intrusion of other atoms, alignments, crystal orientation, atomic motions, and more details are observed. Furthermore, it has emerged from extensive generation tests that molecules and solids can be integrated in such a concise manner than is typically achieved with purely geometric representations. To showcase the intraclass complexity of S–Au and Au–Au bonds visually, a roadmap is plotted by summarizing and correlating the similarities of 8-value-genomes. Furthermore, genomes can be associated with realistic indices easily with a simple multilayer perception architecture as a simple calculating tool. Besides, there are 3 sets of applications, including a set of chemisorption, a set of molecular dynamical analysis, and a set of ultrafast processes, showcasing the interpretability potentials of interatomic genomes in the geometric structures, kinetic properties, and vibration characteristics of molecular systems. As the framework rose to the challenge of nanoclusters from a complicated mesoscopic family of material, the displayed generality and comprehensiveness indicate that the model may “understand” chemical bonds in a machine’s way.","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":null,"pages":null},"PeriodicalIF":5.7000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Pursuing Extreme Descriptive Power and Generality in Chemical Bond Theories: A Method to Decipher “Interatomic Genomes” from Interatomic Electron Structures\",\"authors\":\"Xinxu Zhang, Jiahao Wei, Hui Jia, Jiamin Liu, Guo Li, Ling Liu, Yulong Wu, Changlong Liu, Xiao-Dong Zhang, Yonghui Li\",\"doi\":\"10.1021/acs.jctc.4c00557\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The description and analysis of chemical bonds have been difficult following the popularization of electronic structure calculations. Although many attempts have been made from the perspective of electronic structure, the sheer volume of information in the electronic structure has left contemporary chemical bond analysis methods grappling with an inescapable “Trilemma” where the model briefness, generality, and descriptiveness (descriptive power) cannot be obtained simultaneously. To push the generality and descriptiveness to their extremes, herein a general machine learning-based framework is introduced to compact chemical bonds into a detailed residue-by-residue “genome” with matched encoding/decoding tools. The framework fuses the quantum mechanical aspects, auto feature extraction, nanostructures and/or simulations, and generative models. The encoded genomes are information-dense and decodable, where 100% generality is guaranteed. The descriptiveness of genomes appears to be broader than most known models. As a proof of concept, the realization presented in this work compacts the complete information regarding two critical chemical bonds in thiolate-protected gold nanoclusters, the S–Au and Au–Au bonds, from a Bosonic-Fermionic character perspective into 8-valued genomes. The machine learning component is trained based on 26,528 density functional theory simulated electron localization function images. With an exploration of the space span for the genome, bond polarization, hybridization, intrusion of other atoms, alignments, crystal orientation, atomic motions, and more details are observed. Furthermore, it has emerged from extensive generation tests that molecules and solids can be integrated in such a concise manner than is typically achieved with purely geometric representations. To showcase the intraclass complexity of S–Au and Au–Au bonds visually, a roadmap is plotted by summarizing and correlating the similarities of 8-value-genomes. Furthermore, genomes can be associated with realistic indices easily with a simple multilayer perception architecture as a simple calculating tool. Besides, there are 3 sets of applications, including a set of chemisorption, a set of molecular dynamical analysis, and a set of ultrafast processes, showcasing the interpretability potentials of interatomic genomes in the geometric structures, kinetic properties, and vibration characteristics of molecular systems. As the framework rose to the challenge of nanoclusters from a complicated mesoscopic family of material, the displayed generality and comprehensiveness indicate that the model may “understand” chemical bonds in a machine’s way.\",\"PeriodicalId\":45,\"journal\":{\"name\":\"Journal of Chemical Theory and Computation\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":5.7000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Theory and Computation\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jctc.4c00557\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.4c00557","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

自电子结构计算普及以来,化学键的描述和分析一直是个难题。尽管从电子结构的角度进行了许多尝试,但由于电子结构的信息量巨大,当代化学键分析方法陷入了无法同时获得模型简明性、通用性和描述性(描述力)的 "三难 "境地。为了将通用性和描述性推向极致,本文引入了一个基于机器学习的通用框架,利用匹配的编码/解码工具,将化学键压缩成一个详细的逐残基 "基因组"。该框架融合了量子力学、自动特征提取、纳米结构和/或模拟以及生成模型。编码后的基因组具有信息密集性和可解码性,保证了 100% 的通用性。基因组的描述性似乎比大多数已知模型更广泛。作为概念验证,这项工作中提出的实现方法从玻色-费米子特性的角度,将硫醇保护金纳米团簇中两个关键化学键(S-Au 和 Au-Au 键)的完整信息压缩到 8 值基因组中。机器学习组件基于 26528 个密度泛函理论模拟的电子定位函数图像进行训练。通过探索基因组的空间跨度,可以观察到键的极化、杂化、其他原子的侵入、排列、晶体取向、原子运动等更多细节。此外,通过大量的生成测试发现,分子和固体可以以比纯几何表示法更简洁的方式进行整合。为了直观地展示 S-Au 和 Au-Au 键的类内复杂性,我们通过总结和关联 8 值基因组的相似性绘制了路线图。此外,基因组可以通过一个简单的多层感知架构作为简单的计算工具,轻松地与现实指数相关联。此外,还有三组应用,包括一组化学吸附、一组分子动力学分析和一组超快过程,展示了原子间基因组在分子系统的几何结构、动力学特性和振动特征方面的可解释性潜力。由于该框架能够应对来自复杂介观材料家族的纳米团簇的挑战,其显示出的通用性和全面性表明,该模型可以用机器的方式 "理解 "化学键。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Pursuing Extreme Descriptive Power and Generality in Chemical Bond Theories: A Method to Decipher “Interatomic Genomes” from Interatomic Electron Structures

Pursuing Extreme Descriptive Power and Generality in Chemical Bond Theories: A Method to Decipher “Interatomic Genomes” from Interatomic Electron Structures
The description and analysis of chemical bonds have been difficult following the popularization of electronic structure calculations. Although many attempts have been made from the perspective of electronic structure, the sheer volume of information in the electronic structure has left contemporary chemical bond analysis methods grappling with an inescapable “Trilemma” where the model briefness, generality, and descriptiveness (descriptive power) cannot be obtained simultaneously. To push the generality and descriptiveness to their extremes, herein a general machine learning-based framework is introduced to compact chemical bonds into a detailed residue-by-residue “genome” with matched encoding/decoding tools. The framework fuses the quantum mechanical aspects, auto feature extraction, nanostructures and/or simulations, and generative models. The encoded genomes are information-dense and decodable, where 100% generality is guaranteed. The descriptiveness of genomes appears to be broader than most known models. As a proof of concept, the realization presented in this work compacts the complete information regarding two critical chemical bonds in thiolate-protected gold nanoclusters, the S–Au and Au–Au bonds, from a Bosonic-Fermionic character perspective into 8-valued genomes. The machine learning component is trained based on 26,528 density functional theory simulated electron localization function images. With an exploration of the space span for the genome, bond polarization, hybridization, intrusion of other atoms, alignments, crystal orientation, atomic motions, and more details are observed. Furthermore, it has emerged from extensive generation tests that molecules and solids can be integrated in such a concise manner than is typically achieved with purely geometric representations. To showcase the intraclass complexity of S–Au and Au–Au bonds visually, a roadmap is plotted by summarizing and correlating the similarities of 8-value-genomes. Furthermore, genomes can be associated with realistic indices easily with a simple multilayer perception architecture as a simple calculating tool. Besides, there are 3 sets of applications, including a set of chemisorption, a set of molecular dynamical analysis, and a set of ultrafast processes, showcasing the interpretability potentials of interatomic genomes in the geometric structures, kinetic properties, and vibration characteristics of molecular systems. As the framework rose to the challenge of nanoclusters from a complicated mesoscopic family of material, the displayed generality and comprehensiveness indicate that the model may “understand” chemical bonds in a machine’s way.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信