{"title":"探索生物学和医学未触及的蛋白质空间的路线图","authors":"Jun Wang","doi":"10.1016/j.hlife.2023.06.001","DOIUrl":null,"url":null,"abstract":"<div><p>Proteins are the major carriers of biological processes and extant proteome contains tremendous diversity. However, the theoretical diversity of proteins greatly outnumbered the currently known, largely due to evolutionary constraints. Here, we propose that untouched protein space, either extant yet with unknown function, or unnatural proteins could have many proteins of desired functions, and outlined a roadmap for exploring such protein space with artificial intelligence. Particularly with the methods developed in natural language processing (NLP), we can first identify a large number of functional proteins and peptides encrypted in biological big data, for instance microbiome and virome data. Secondly, larger scale mutations and directed evolution can be carried out and facilitated by NLP, to achieve improved function based on known proteins. Lastly, sampling random sequences and applying NLP might reveal the more complete landscape of protein functions and enable de novo protein design.</p></div>","PeriodicalId":100609,"journal":{"name":"hLife","volume":"1 2","pages":"Pages 93-97"},"PeriodicalIF":0.0000,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2949928323000032/pdfft?md5=4ed5e33d373e2303d1f4b9a74b21237a&pid=1-s2.0-S2949928323000032-main.pdf","citationCount":"0","resultStr":"{\"title\":\"A roadmap for exploring the untouched protein space for biology and medicine\",\"authors\":\"Jun Wang\",\"doi\":\"10.1016/j.hlife.2023.06.001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Proteins are the major carriers of biological processes and extant proteome contains tremendous diversity. However, the theoretical diversity of proteins greatly outnumbered the currently known, largely due to evolutionary constraints. Here, we propose that untouched protein space, either extant yet with unknown function, or unnatural proteins could have many proteins of desired functions, and outlined a roadmap for exploring such protein space with artificial intelligence. Particularly with the methods developed in natural language processing (NLP), we can first identify a large number of functional proteins and peptides encrypted in biological big data, for instance microbiome and virome data. Secondly, larger scale mutations and directed evolution can be carried out and facilitated by NLP, to achieve improved function based on known proteins. Lastly, sampling random sequences and applying NLP might reveal the more complete landscape of protein functions and enable de novo protein design.</p></div>\",\"PeriodicalId\":100609,\"journal\":{\"name\":\"hLife\",\"volume\":\"1 2\",\"pages\":\"Pages 93-97\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.sciencedirect.com/science/article/pii/S2949928323000032/pdfft?md5=4ed5e33d373e2303d1f4b9a74b21237a&pid=1-s2.0-S2949928323000032-main.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"hLife\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949928323000032\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"hLife","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949928323000032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A roadmap for exploring the untouched protein space for biology and medicine
Proteins are the major carriers of biological processes and extant proteome contains tremendous diversity. However, the theoretical diversity of proteins greatly outnumbered the currently known, largely due to evolutionary constraints. Here, we propose that untouched protein space, either extant yet with unknown function, or unnatural proteins could have many proteins of desired functions, and outlined a roadmap for exploring such protein space with artificial intelligence. Particularly with the methods developed in natural language processing (NLP), we can first identify a large number of functional proteins and peptides encrypted in biological big data, for instance microbiome and virome data. Secondly, larger scale mutations and directed evolution can be carried out and facilitated by NLP, to achieve improved function based on known proteins. Lastly, sampling random sequences and applying NLP might reveal the more complete landscape of protein functions and enable de novo protein design.