Anastasiya V. Kulikova, Daniel J. Diaz, James M. Loy, Andrew D. Ellington, Claus O. Wilke
{"title":"Learning the local landscape of protein structures with convolutional neural networks","authors":"Anastasiya V. Kulikova, Daniel J. Diaz, James M. Loy, Andrew D. Ellington, Claus O. Wilke","doi":"10.1007/s10867-021-09593-6","DOIUrl":null,"url":null,"abstract":"<div><p>One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.</p></div>","PeriodicalId":612,"journal":{"name":"Journal of Biological Physics","volume":null,"pages":null},"PeriodicalIF":1.8000,"publicationDate":"2021-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10867-021-09593-6.pdf","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biological Physics","FirstCategoryId":"99","ListUrlMain":"https://link.springer.com/article/10.1007/s10867-021-09593-6","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOPHYSICS","Score":null,"Total":0}
引用次数: 5
Abstract
One fundamental problem of protein biochemistry is to predict protein structure from amino acid sequence. The inverse problem, predicting either entire sequences or individual mutations that are consistent with a given protein structure, has received much less attention even though it has important applications in both protein engineering and evolutionary biology. Here, we ask whether 3D convolutional neural networks (3D CNNs) can learn the local fitness landscape of protein structure to reliably predict either the wild-type amino acid or the consensus in a multiple sequence alignment from the local structural context surrounding site of interest. We find that the network can predict wild type with good accuracy, and that network confidence is a reliable measure of whether a given prediction is likely going to be correct or not. Predictions of consensus are less accurate and are primarily driven by whether or not the consensus matches the wild type. Our work suggests that high-confidence mis-predictions of the wild type may identify sites that are primed for mutation and likely targets for protein engineering.
期刊介绍:
Many physicists are turning their attention to domains that were not traditionally part of physics and are applying the sophisticated tools of theoretical, computational and experimental physics to investigate biological processes, systems and materials.
The Journal of Biological Physics provides a medium where this growing community of scientists can publish its results and discuss its aims and methods. It welcomes papers which use the tools of physics in an innovative way to study biological problems, as well as research aimed at providing a better understanding of the physical principles underlying biological processes.