蛋白质结构类的深度学习:有“折叠”的证据吗?

2020 Systems and Information Engineering Design Symposium (SIEDS) Pub Date : 2020-04-01 DOI:10.1109/SIEDS49339.2020.9106642

Menuka Jaiswal, Saad Saleem, Yonghyeon Kweon, Eli J. Draizen, S. Veretnik, C. Mura, P. Bourne

{"title":"蛋白质结构类的深度学习:有“折叠”的证据吗?","authors":"Menuka Jaiswal, Saad Saleem, Yonghyeon Kweon, Eli J. Draizen, S. Veretnik, C. Mura, P. Bourne","doi":"10.1109/SIEDS49339.2020.9106642","DOIUrl":null,"url":null,"abstract":"Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails—but is not equivalent to—a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATH’s “homologous superfamily” (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, ‘define’ different homologous SFs. We evaluate and quantify pairwise ‘distances’ between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationships—a view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties.","PeriodicalId":331495,"journal":{"name":"2020 Systems and Information Engineering Design Symposium (SIEDS)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Deep Learning of Protein Structural Classes: Any Evidence for an ‘Urfold’?\",\"authors\":\"Menuka Jaiswal, Saad Saleem, Yonghyeon Kweon, Eli J. Draizen, S. Veretnik, C. Mura, P. Bourne\",\"doi\":\"10.1109/SIEDS49339.2020.9106642\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails—but is not equivalent to—a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATH’s “homologous superfamily” (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, ‘define’ different homologous SFs. We evaluate and quantify pairwise ‘distances’ between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationships—a view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties.\",\"PeriodicalId\":331495,\"journal\":{\"name\":\"2020 Systems and Information Engineering Design Symposium (SIEDS)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 Systems and Information Engineering Design Symposium (SIEDS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SIEDS49339.2020.9106642\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 Systems and Information Engineering Design Symposium (SIEDS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SIEDS49339.2020.9106642","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

从氨基酸序列中精确预测蛋白质三维(3D)结构的最新计算进展现在提供了一个独特的机会来破译蛋白质之间的相互关系。这项任务需要——但不等同于——一个三维结构比较和分类的问题。从历史上看，蛋白质结构域分类在很大程度上是一种人工和主观的活动，依赖于各种启发式。像CATH这样的数据库代表了朝着更系统化(和自动化)的方法迈出的重要一步，但在机器学习的基础上，仍然有很大的空间发展更具可扩展性和定量的分类方法。我们怀疑，通过深度学习(DL)方法重新检查这些关系可能需要对分类方案进行大规模重组，从而提高蛋白质之间远距离关系的可解释性。在这里，我们描述了我们在蛋白质结构域结构(及其相关的物理化学性质)上的DL模型的训练，以便在CATH的“同源超家族”(SF)水平上评估分类特性。为了实现这一点，我们设计并应用了图像分类方法和图像分割技术的扩展，利用卷积自编码器模型架构。我们的DL架构允许模型学习结构特征，在某种意义上，“定义”不同的同源SFs。我们通过每个SF建立一个模型并比较模型的损失函数来评估和量化SF之间的成对“距离”。在这些距离矩阵上的层次聚类为蛋白质相互关系提供了一种新的视角，这种视角超越了简单的结构/几何相似性，并向结构/功能属性领域扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deep Learning of Protein Structural Classes: Any Evidence for an ‘Urfold’?

Recent computational advances in the accurate prediction of protein three-dimensional (3D) structures from amino acid sequences now present a unique opportunity to decipher the interrelationships between proteins. This task entails—but is not equivalent to—a problem of 3D structure comparison and classification. Historically, protein domain classification has been a largely manual and subjective activity, relying upon various heuristics. Databases such as CATH represent significant steps towards a more systematic (and automatable) approach, yet there still remains much room for the development of more scalable and quantitative classification methods, grounded in machine learning. We suspect that re-examining these relationships via a Deep Learning (DL) approach may entail a large-scale restructuring of classification schemes, improved with respect to the interpretability of distant relationships between proteins. Here, we describe our training of DL models on protein domain structures (and their associated physicochemical properties) in order to evaluate classification properties at CATH’s “homologous superfamily” (SF) level. To achieve this, we have devised and applied an extension of image-classification methods and image segmentation techniques, utilizing a convolutional autoencoder model architecture. Our DL architecture allows models to learn structural features that, in a sense, ‘define’ different homologous SFs. We evaluate and quantify pairwise ‘distances’ between SFs by building one model per SF and comparing the loss functions of the models. Hierarchical clustering on these distance matrices provides a new view of protein interrelationships—a view that extends beyond simple structural/geometric similarity, and towards the realm of structure/function properties.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 Systems and Information Engineering Design Symposium (SIEDS)

自引率

0.00%

发文量