{"title":"从分子动力学轨迹中提取动态特征用于机器学习任务。","authors":"Yang Zhang, Andreas Vitalis","doi":"10.1093/bioinformatics/btaf321","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.</p><p><strong>Availability and implementation: </strong>The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nearl: Extracting dynamic features from molecular dynamics trajectories for machine learning tasks.\",\"authors\":\"Yang Zhang, Andreas Vitalis\",\"doi\":\"10.1093/bioinformatics/btaf321\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Summary: </strong>Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.</p><p><strong>Availability and implementation: </strong>The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf321\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nearl: Extracting dynamic features from molecular dynamics trajectories for machine learning tasks.
Summary: Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.
Availability and implementation: The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.
Supplementary information: Supplementary data are available at Bioinformatics online.