{"title":"Nearl: Extracting dynamic features from molecular dynamics trajectories for machine learning tasks.","authors":"Yang Zhang, Andreas Vitalis","doi":"10.1093/bioinformatics/btaf321","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.</p><p><strong>Availability and implementation: </strong>The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Summary: Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.
Availability and implementation: The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.
Supplementary information: Supplementary data are available at Bioinformatics online.