Nearl: Extracting dynamic features from molecular dynamics trajectories for machine learning tasks.

Yang Zhang, Andreas Vitalis
{"title":"Nearl: Extracting dynamic features from molecular dynamics trajectories for machine learning tasks.","authors":"Yang Zhang, Andreas Vitalis","doi":"10.1093/bioinformatics/btaf321","DOIUrl":null,"url":null,"abstract":"<p><strong>Summary: </strong>Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.</p><p><strong>Availability and implementation: </strong>The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf321","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Summary: Despite the rapid growth of machine learning in biomolecular applications, information about protein dynamics is underutilized. Here, we introduce Nearl, an automated pipeline designed to extract dynamic features from large ensembles of molecular dynamics (MD) trajectories. Nearl aims to identify intrinsic patterns of molecular motion and to provide informative features for predictive modelling tasks. We implement two classes of dynamic features, termed marching observers and property-density flow, to capture local atomic motions while maintaining a view of the global configuration. Complemented by standard voxelization techniques, Nearl transforms substructures of proteins into 3D grids, suitable for contemporary 3D convolutional neural networks (3D-CNNs). The pipeline leverages GPU acceleration, adheres to the FAIR principles for research software, and prioritizes flexibility and user-friendliness, allowing customization of input formats and feature extraction.

Availability and implementation: The source code of Nearl is hosted at https://github.com/miemiemmmm/Nearl and archived at https://doi.org/10.5281/zenodo.15320286. The documentation is hosted on ReadTheDocs at https://nearl.readthedocs.io/en/latest/. All pre-built models are implemented in PyTorch and available on GitHub.

Supplementary information: Supplementary data are available at Bioinformatics online.

从分子动力学轨迹中提取动态特征用于机器学习任务。
摘要:尽管机器学习在生物分子领域的应用迅速发展,但关于蛋白质动力学的信息尚未得到充分利用。在这里,我们介绍了Nearl,一个自动管道,旨在从分子动力学(MD)轨迹的大集合中提取动态特征。near旨在识别分子运动的内在模式,并为预测建模任务提供信息特征。我们实现了两类动态特性,称为行进观察者和属性密度流,以捕获局部原子运动,同时保持全局配置的视图。通过标准体素化技术,Nearl将蛋白质的子结构转换为3D网格,适用于当代3D卷积神经网络(3D- cnn)。该管道利用GPU加速,坚持研究软件的FAIR原则,并优先考虑灵活性和用户友好性,允许自定义输入格式和特征提取。可用性和实现:near的源代码托管于https://github.com/miemiemmmm/Nearl,存档于https://doi.org/10.5281/zenodo.15320286。该文档托管在ReadTheDocs的https://nearl.readthedocs.io/en/latest/上。所有预构建的模型都在PyTorch中实现,并且可以在GitHub上获得。补充信息:补充数据可在生物信息学在线获取。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信