Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova
{"title":"用于蛋白质结构和功能基因组级预测的高性能深度学习工具箱。","authors":"Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova","doi":"10.1109/mlhpc54614.2021.00010","DOIUrl":null,"url":null,"abstract":"<p><p>Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.</p>","PeriodicalId":75334,"journal":{"name":"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments","volume":"2021 ","pages":"46-57"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802329/pdf/nihms-1769610.pdf","citationCount":"10","resultStr":"{\"title\":\"High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.\",\"authors\":\"Mu Gao, Peik Lund-Andersen, Alex Morehead, Sajid Mahmud, Chen Chen, Xiao Chen, Nabin Giri, Raj S Roy, Farhan Quadir, T Chad Effler, Ryan Prout, Subil Abraham, Wael Elwasif, N Quentin Haas, Jeffrey Skolnick, Jianlin Cheng, Ada Sedova\",\"doi\":\"10.1109/mlhpc54614.2021.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.</p>\",\"PeriodicalId\":75334,\"journal\":{\"name\":\"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments\",\"volume\":\"2021 \",\"pages\":\"46-57\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8802329/pdf/nihms-1769610.pdf\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/mlhpc54614.2021.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Workshop on Machine Learning in HPC Environments. Workshop on Machine Learning in HPC Environments","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/mlhpc54614.2021.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
High-Performance Deep Learning Toolbox for Genome-Scale Prediction of Protein Structure and Function.
Computational biology is one of many scientific disciplines ripe for innovation and acceleration with the advent of high-performance computing (HPC). In recent years, the field of machine learning has also seen significant benefits from adopting HPC practices. In this work, we present a novel HPC pipeline that incorporates various machine-learning approaches for structure-based functional annotation of proteins on the scale of whole genomes. Our pipeline makes extensive use of deep learning and provides computational insights into best practices for training advanced deep-learning models for high-throughput data such as proteomics data. We showcase methodologies our pipeline currently supports and detail future tasks for our pipeline to envelop, including large-scale sequence comparison using SAdLSA and prediction of protein tertiary structures using AlphaFold2.