用于蛋白质结构和功能基因组级预测的高性能深度学习工具箱。

Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments Pub Date : 2021-11-01 DOI:10.1109/mlhpc54614.2021.00010

Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova

{"title":"用于蛋白质结构和功能基因组级预测的高性能深度学习工具箱。","authors":"Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova","doi":"10.1109/mlhpc54614.2021.00010","DOIUrl":null,"url":null,"abstract":"Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.","PeriodicalId":75334,"journal":{"name":"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments","volume":"2021 ","pages":"46-57"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802329/pdf/nihms-1769610.pdf","citationCount":"10","resultStr":"{\"title\":\"High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.\",\"authors\":\"Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova\",\"doi\":\"10.1109/mlhpc54614.2021.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.\",\"PeriodicalId\":75334,\"journal\":{\"name\":\"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments\",\"volume\":\"2021 \",\"pages\":\"46-57\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802329/pdf/nihms-1769610.pdf\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/mlhpc54614.2021.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlhpc54614.2021.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

随着高性能计算(HPC)的出现，计算生物学是许多需要创新和加速的科学学科之一。近年来，机器学习领域也从采用高性能计算实践中获益良多。在这项工作中，我们提出了一种新的HPC管道，该管道结合了各种机器学习方法，用于在全基因组规模上对蛋白质进行基于结构的功能注释。我们的产品线广泛使用深度学习，并为高通量数据(如蛋白质组学数据)训练高级深度学习模型的最佳实践提供计算见解。我们展示了我们的管道目前支持的方法，并详细介绍了我们的管道要包膜的未来任务，包括使用SAdLSA进行大规模序列比较和使用AlphaFold2预测蛋白质三级结构。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.

查看原文本刊更多论文

High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.

Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments

自引率

0.00%

发文量