Chiara Lionello, Matteo Becchi, Simone Martino, Giovanni M Pavan
{"title":"Relevant, Hidden, and Frustrated Information in High-Dimensional Analyses of Complex Dynamical Systems with Internal Noise.","authors":"Chiara Lionello, Matteo Becchi, Simone Martino, Giovanni M Pavan","doi":"10.1021/acs.jctc.5c00374","DOIUrl":null,"url":null,"abstract":"<p><p>Extracting from trajectory data meaningful information to understand complex molecular systems might be nontrivial. High-dimensional analyses are typically assumed to be desirable, if not required, to prevent losing important information. But to what extent such high-dimensionality is really needed/beneficial often remains unclear. Here we challenge such a fundamental general problem. As a representative case of a system with internal dynamical complexity, we study atomistic molecular dynamics trajectories of liquid water and ice coexisting in dynamical equilibrium at the solid/liquid transition temperature. To attain an intrinsically high-dimensional analysis, we use as an example an abstract high-dimensional descriptor of local molecular environments (e.g., Smooth Overlap of Atomic Positions, SOAP), obtaining a large dataset containing 2.56 × 10<sup>6</sup> 576-dimensional SOAP spectra that we analyze in various ways. Our results demonstrate how the time-series data contained in one single SOAP dimension accounting only <0.001% of the total dataset's variance (neglected and discarded in typical variance-based dimensionality reduction approaches) allows resolving a remarkable amount of information, classifying/discriminating the bulk of water and ice phases, as well as two solid-interface and liquid-interface layers as four statistically distinct dynamical molecular environments. Adding more dimensions to this one is found not only ineffective but even detrimental to the analysis due to recurrent negligible-information/non-negligible-noise additions and \"frustrated information\" phenomena leading to information loss. Such effects are proven general and are observed also in completely different systems and descriptors' combinations. This shows how high-dimensional analyses are not necessarily better than low-dimensional ones to elucidate the internal complexity of physical/chemical systems, especially when these are characterized by non-negligible internal noise.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"6683-6697"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.5c00374","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/2 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Extracting from trajectory data meaningful information to understand complex molecular systems might be nontrivial. High-dimensional analyses are typically assumed to be desirable, if not required, to prevent losing important information. But to what extent such high-dimensionality is really needed/beneficial often remains unclear. Here we challenge such a fundamental general problem. As a representative case of a system with internal dynamical complexity, we study atomistic molecular dynamics trajectories of liquid water and ice coexisting in dynamical equilibrium at the solid/liquid transition temperature. To attain an intrinsically high-dimensional analysis, we use as an example an abstract high-dimensional descriptor of local molecular environments (e.g., Smooth Overlap of Atomic Positions, SOAP), obtaining a large dataset containing 2.56 × 106 576-dimensional SOAP spectra that we analyze in various ways. Our results demonstrate how the time-series data contained in one single SOAP dimension accounting only <0.001% of the total dataset's variance (neglected and discarded in typical variance-based dimensionality reduction approaches) allows resolving a remarkable amount of information, classifying/discriminating the bulk of water and ice phases, as well as two solid-interface and liquid-interface layers as four statistically distinct dynamical molecular environments. Adding more dimensions to this one is found not only ineffective but even detrimental to the analysis due to recurrent negligible-information/non-negligible-noise additions and "frustrated information" phenomena leading to information loss. Such effects are proven general and are observed also in completely different systems and descriptors' combinations. This shows how high-dimensional analyses are not necessarily better than low-dimensional ones to elucidate the internal complexity of physical/chemical systems, especially when these are characterized by non-negligible internal noise.
期刊介绍:
The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.