Relevant, Hidden, and Frustrated Information in High-Dimensional Analyses of Complex Dynamical Systems with Internal Noise.

IF 5.7 1区 化学 Q2 CHEMISTRY, PHYSICAL
Journal of Chemical Theory and Computation Pub Date : 2025-07-22 Epub Date: 2025-07-02 DOI:10.1021/acs.jctc.5c00374
Chiara Lionello, Matteo Becchi, Simone Martino, Giovanni M Pavan
{"title":"Relevant, Hidden, and Frustrated Information in High-Dimensional Analyses of Complex Dynamical Systems with Internal Noise.","authors":"Chiara Lionello, Matteo Becchi, Simone Martino, Giovanni M Pavan","doi":"10.1021/acs.jctc.5c00374","DOIUrl":null,"url":null,"abstract":"<p><p>Extracting from trajectory data meaningful information to understand complex molecular systems might be nontrivial. High-dimensional analyses are typically assumed to be desirable, if not required, to prevent losing important information. But to what extent such high-dimensionality is really needed/beneficial often remains unclear. Here we challenge such a fundamental general problem. As a representative case of a system with internal dynamical complexity, we study atomistic molecular dynamics trajectories of liquid water and ice coexisting in dynamical equilibrium at the solid/liquid transition temperature. To attain an intrinsically high-dimensional analysis, we use as an example an abstract high-dimensional descriptor of local molecular environments (e.g., Smooth Overlap of Atomic Positions, SOAP), obtaining a large dataset containing 2.56 × 10<sup>6</sup> 576-dimensional SOAP spectra that we analyze in various ways. Our results demonstrate how the time-series data contained in one single SOAP dimension accounting only <0.001% of the total dataset's variance (neglected and discarded in typical variance-based dimensionality reduction approaches) allows resolving a remarkable amount of information, classifying/discriminating the bulk of water and ice phases, as well as two solid-interface and liquid-interface layers as four statistically distinct dynamical molecular environments. Adding more dimensions to this one is found not only ineffective but even detrimental to the analysis due to recurrent negligible-information/non-negligible-noise additions and \"frustrated information\" phenomena leading to information loss. Such effects are proven general and are observed also in completely different systems and descriptors' combinations. This shows how high-dimensional analyses are not necessarily better than low-dimensional ones to elucidate the internal complexity of physical/chemical systems, especially when these are characterized by non-negligible internal noise.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"6683-6697"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.5c00374","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Extracting from trajectory data meaningful information to understand complex molecular systems might be nontrivial. High-dimensional analyses are typically assumed to be desirable, if not required, to prevent losing important information. But to what extent such high-dimensionality is really needed/beneficial often remains unclear. Here we challenge such a fundamental general problem. As a representative case of a system with internal dynamical complexity, we study atomistic molecular dynamics trajectories of liquid water and ice coexisting in dynamical equilibrium at the solid/liquid transition temperature. To attain an intrinsically high-dimensional analysis, we use as an example an abstract high-dimensional descriptor of local molecular environments (e.g., Smooth Overlap of Atomic Positions, SOAP), obtaining a large dataset containing 2.56 × 106 576-dimensional SOAP spectra that we analyze in various ways. Our results demonstrate how the time-series data contained in one single SOAP dimension accounting only <0.001% of the total dataset's variance (neglected and discarded in typical variance-based dimensionality reduction approaches) allows resolving a remarkable amount of information, classifying/discriminating the bulk of water and ice phases, as well as two solid-interface and liquid-interface layers as four statistically distinct dynamical molecular environments. Adding more dimensions to this one is found not only ineffective but even detrimental to the analysis due to recurrent negligible-information/non-negligible-noise additions and "frustrated information" phenomena leading to information loss. Such effects are proven general and are observed also in completely different systems and descriptors' combinations. This shows how high-dimensional analyses are not necessarily better than low-dimensional ones to elucidate the internal complexity of physical/chemical systems, especially when these are characterized by non-negligible internal noise.

具有内噪声的复杂动力系统高维分析中的相关、隐藏和受挫信息。
从轨迹数据中提取有意义的信息来理解复杂的分子系统可能是非常重要的。高维分析通常被认为是可取的(如果不是必需的),以防止丢失重要信息。但是在多大程度上这种高维性是真正需要/有益的仍然不清楚。在这里,我们挑战这样一个基本的一般性问题。作为具有内部动力学复杂性的系统的典型例子,我们研究了在固/液转变温度下处于动力学平衡的液态水和冰的原子分子动力学轨迹。为了获得本质上的高维分析,我们使用局部分子环境的抽象高维描述符(例如,原子位置的平滑重叠,SOAP)作为示例,获得包含2.56 × 106 576维SOAP光谱的大型数据集,我们以各种方式进行分析。我们的结果演示了时间序列数据如何只包含在一个SOAP维度中
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信