Explore Conformational Space of Proteins with Supervised Auto-Encoder

Chinese Physics Pub Date : 2023-01-01 DOI:10.7498/aps.72.20231060

None Chen Guanglin, None Zhang Zhiyong

{"title":"Explore Conformational Space of Proteins with Supervised Auto-Encoder","authors":"None Chen Guanglin, None Zhang Zhiyong","doi":"10.7498/aps.72.20231060","DOIUrl":null,"url":null,"abstract":"Protein function is related to its structure and dynamics. Molecular dynamics simulation is an important tool in the study of protein dynamics by exploring its conformational space, however, conformational sampling is a nontrivial issue, since the risk of missing key details due to under-sampling. In recent years, deep learning methods, such as auto-encoder, can couple with MD to explore conformational space of proteins. After training with the MD trajectories, auto-encoder can generate new conformations quickly by inputting random numbers in low dimension space. However, some issues still remain, such as requirements for the quality of the training set, the limitation of explorable area and the undefined sampling direction. In this work, we have built a supervised auto-encoder, in which some reaction coordinates are used to guide conformational exploration alone certain directions. We have also tried expanding the explorable area by training with the data generated by the model. Two multi-domain proteins, bacteriophage T4 lysozyme and adenylate kinase, were used to illustrate the method. In the case of the training set consisting of only under-sampled simulated trajectories, the supervised auto-encoder can still explore alone the given reaction coordinates. The explored conformational space can cover all the experimental structures of the proteins and be extended to regions far from the training sets. Having been verified by molecular dynamics and secondary structure calculations, most of the conformations explored were found to be plausible. The supervised auto-encoder provides a way to efficiently expand the conformational space of a protein with limited computational resources, although some suitable reaction coordinates is required. By integrate appropriate reaction coordinates or experimental data, the supervised auto-encoder may serve as an efficient tool for exploring conformational space of proteins.","PeriodicalId":10252,"journal":{"name":"Chinese Physics","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Chinese Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7498/aps.72.20231060","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Protein function is related to its structure and dynamics. Molecular dynamics simulation is an important tool in the study of protein dynamics by exploring its conformational space, however, conformational sampling is a nontrivial issue, since the risk of missing key details due to under-sampling. In recent years, deep learning methods, such as auto-encoder, can couple with MD to explore conformational space of proteins. After training with the MD trajectories, auto-encoder can generate new conformations quickly by inputting random numbers in low dimension space. However, some issues still remain, such as requirements for the quality of the training set, the limitation of explorable area and the undefined sampling direction. In this work, we have built a supervised auto-encoder, in which some reaction coordinates are used to guide conformational exploration alone certain directions. We have also tried expanding the explorable area by training with the data generated by the model. Two multi-domain proteins, bacteriophage T4 lysozyme and adenylate kinase, were used to illustrate the method. In the case of the training set consisting of only under-sampled simulated trajectories, the supervised auto-encoder can still explore alone the given reaction coordinates. The explored conformational space can cover all the experimental structures of the proteins and be extended to regions far from the training sets. Having been verified by molecular dynamics and secondary structure calculations, most of the conformations explored were found to be plausible. The supervised auto-encoder provides a way to efficiently expand the conformational space of a protein with limited computational resources, although some suitable reaction coordinates is required. By integrate appropriate reaction coordinates or experimental data, the supervised auto-encoder may serve as an efficient tool for exploring conformational space of proteins.

查看原文本刊更多论文

用监督自编码器探索蛋白质的构象空间

蛋白质的功能与其结构和动力学有关。分子动力学模拟是通过探索蛋白质的构象空间来研究蛋白质动力学的重要工具，然而，构象采样是一个不容忽视的问题，因为由于采样不足，有可能丢失关键细节。近年来，深度学习方法，如自编码器，可以与MD结合来探索蛋白质的构象空间。自编码器经MD轨迹训练后，通过在低维空间输入随机数，可以快速生成新的构象。但是，该方法仍然存在训练集质量要求、可探索区域的限制以及采样方向不明确等问题。在这项工作中，我们建立了一个有监督的自编码器，其中一些反应坐标用于单独指导构象探索的特定方向。我们还尝试通过使用模型生成的数据进行训练来扩展可探索区域。以噬菌体T4溶菌酶和腺苷酸激酶两种多结构域蛋白为例。在训练集仅由欠采样模拟轨迹组成的情况下，监督自编码器仍然可以单独探索给定的反应坐标。探索的构象空间可以覆盖蛋白质的所有实验结构，并扩展到远离训练集的区域。经过分子动力学和二级结构计算的验证，发现探索的大多数构象都是可信的。尽管需要一些合适的反应坐标，但监督式自编码器提供了一种在有限的计算资源下有效扩展蛋白质构象空间的方法。通过整合适当的反应坐标或实验数据，监督式自编码器可以作为探索蛋白质构象空间的有效工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Chinese Physics

自引率

0.00%

发文量