基于生成式人工智能(ProtGPT2)的小尺寸酶蛋白的计算机设计。

IF 2.9 4区 生物学 Q3 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Hiroyuki Hamada, Tamon Matsuzawa, Taizo Hanai
{"title":"基于生成式人工智能(ProtGPT2)的小尺寸酶蛋白的计算机设计。","authors":"Hiroyuki Hamada, Tamon Matsuzawa, Taizo Hanai","doi":"10.1016/j.jbiosc.2025.06.009","DOIUrl":null,"url":null,"abstract":"<p><p>The construction of small proteins by removing amino acid subsequences that are not involved in function, activity, or structure is crucial for bioprocessing and drug development. Traditional design methods often focus on reconstructing functional motifs, but they face challenges in stabilizing structure and reproducing function. In this study, we aimed to develop a design method for small proteins using ProtGPT2, a model that generates protein sequences based on function and structure. First, amino acid sequence data of malate dehydrogenase (MDH) was collected, and ProtGPT2 was fine-tuned (ProtGPT2 for MDH). The chain length and perplexity (ppl) of the generated sequences were evaluated, producing shorter sequences than the natural ones. The validity of the generated sequences was assessed using both population and individual analyses. Population analysis, including multiple sequence alignment (MSA) and t-distributed stochastic neighbor embedding (tSNE), revealed that ProtGPT2 for MDH identified functional motifs of MDH and incorporated them into the generated sequences. Additionally, tSNE showed that the generated sequences were highly similar to natural MDH sequences. In individual analysis, 10 randomly selected sequences were evaluated using BLAST, AlphaFold2, and InterPro. BLAST indicated that 9 sequences were novel MDH variants. AlphaFold2 confirmed that their 3D structures were highly similar to known MDH structures. InterPro identified domains and active sites in 2 sequences, suggesting that they were novel, small MDH variants. In conclusion, ProtGPT2 for MDH has the potential to design amino acid sequence candidates for small MDHs. The validity and utility of the model will be established through future experimental efforts.</p>","PeriodicalId":15199,"journal":{"name":"Journal of bioscience and bioengineering","volume":" ","pages":"174-179"},"PeriodicalIF":2.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"In silico design of smaller size enzymatic protein by generative artificial intelligence (ProtGPT2).\",\"authors\":\"Hiroyuki Hamada, Tamon Matsuzawa, Taizo Hanai\",\"doi\":\"10.1016/j.jbiosc.2025.06.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The construction of small proteins by removing amino acid subsequences that are not involved in function, activity, or structure is crucial for bioprocessing and drug development. Traditional design methods often focus on reconstructing functional motifs, but they face challenges in stabilizing structure and reproducing function. In this study, we aimed to develop a design method for small proteins using ProtGPT2, a model that generates protein sequences based on function and structure. First, amino acid sequence data of malate dehydrogenase (MDH) was collected, and ProtGPT2 was fine-tuned (ProtGPT2 for MDH). The chain length and perplexity (ppl) of the generated sequences were evaluated, producing shorter sequences than the natural ones. The validity of the generated sequences was assessed using both population and individual analyses. Population analysis, including multiple sequence alignment (MSA) and t-distributed stochastic neighbor embedding (tSNE), revealed that ProtGPT2 for MDH identified functional motifs of MDH and incorporated them into the generated sequences. Additionally, tSNE showed that the generated sequences were highly similar to natural MDH sequences. In individual analysis, 10 randomly selected sequences were evaluated using BLAST, AlphaFold2, and InterPro. BLAST indicated that 9 sequences were novel MDH variants. AlphaFold2 confirmed that their 3D structures were highly similar to known MDH structures. InterPro identified domains and active sites in 2 sequences, suggesting that they were novel, small MDH variants. In conclusion, ProtGPT2 for MDH has the potential to design amino acid sequence candidates for small MDHs. The validity and utility of the model will be established through future experimental efforts.</p>\",\"PeriodicalId\":15199,\"journal\":{\"name\":\"Journal of bioscience and bioengineering\",\"volume\":\" \",\"pages\":\"174-179\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of bioscience and bioengineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jbiosc.2025.06.009\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioscience and bioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.jbiosc.2025.06.009","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/10 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

通过去除与功能、活性或结构无关的氨基酸子序列来构建小蛋白对于生物加工和药物开发至关重要。传统的设计方法往往侧重于功能母题的重构,但在结构的稳定和功能的再现方面面临着挑战。在本研究中,我们旨在利用基于功能和结构生成蛋白质序列的模型ProtGPT2,开发一种小分子蛋白质的设计方法。首先收集苹果酸脱氢酶(MDH)的氨基酸序列数据,并对ProtGPT2进行微调(ProtGPT2 for MDH)。对所生成序列的链长和困惑度(ppl)进行了评价,得到的序列比天然序列短。使用群体和个体分析评估生成序列的有效性。种群分析包括多序列比对(MSA)和t分布随机邻居嵌入(tSNE),结果表明ProtGPT2能够识别MDH的功能基序,并将其整合到生成的序列中。此外,tSNE显示生成的序列与天然MDH序列高度相似。在个体分析中,随机选择10个序列,使用BLAST、AlphaFold2和InterPro进行评估。BLAST结果显示,其中9个序列为新的MDH变异。AlphaFold2证实它们的3D结构与已知的MDH结构高度相似。InterPro鉴定了2个序列的结构域和活性位点,表明它们是新的、小的MDH变体。总之,MDH的ProtGPT2具有设计小MDH候选氨基酸序列的潜力。该模型的有效性和实用性有待于进一步的实验验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
In silico design of smaller size enzymatic protein by generative artificial intelligence (ProtGPT2).

The construction of small proteins by removing amino acid subsequences that are not involved in function, activity, or structure is crucial for bioprocessing and drug development. Traditional design methods often focus on reconstructing functional motifs, but they face challenges in stabilizing structure and reproducing function. In this study, we aimed to develop a design method for small proteins using ProtGPT2, a model that generates protein sequences based on function and structure. First, amino acid sequence data of malate dehydrogenase (MDH) was collected, and ProtGPT2 was fine-tuned (ProtGPT2 for MDH). The chain length and perplexity (ppl) of the generated sequences were evaluated, producing shorter sequences than the natural ones. The validity of the generated sequences was assessed using both population and individual analyses. Population analysis, including multiple sequence alignment (MSA) and t-distributed stochastic neighbor embedding (tSNE), revealed that ProtGPT2 for MDH identified functional motifs of MDH and incorporated them into the generated sequences. Additionally, tSNE showed that the generated sequences were highly similar to natural MDH sequences. In individual analysis, 10 randomly selected sequences were evaluated using BLAST, AlphaFold2, and InterPro. BLAST indicated that 9 sequences were novel MDH variants. AlphaFold2 confirmed that their 3D structures were highly similar to known MDH structures. InterPro identified domains and active sites in 2 sequences, suggesting that they were novel, small MDH variants. In conclusion, ProtGPT2 for MDH has the potential to design amino acid sequence candidates for small MDHs. The validity and utility of the model will be established through future experimental efforts.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of bioscience and bioengineering
Journal of bioscience and bioengineering 生物-生物工程与应用微生物
CiteScore
5.90
自引率
3.60%
发文量
144
审稿时长
51 days
期刊介绍: The Journal of Bioscience and Bioengineering is a research journal publishing original full-length research papers, reviews, and Letters to the Editor. The Journal is devoted to the advancement and dissemination of knowledge concerning fermentation technology, biochemical engineering, food technology and microbiology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信