{"title":"基于生成式人工智能(ProtGPT2)的小尺寸酶蛋白的计算机设计。","authors":"Hiroyuki Hamada, Tamon Matsuzawa, Taizo Hanai","doi":"10.1016/j.jbiosc.2025.06.009","DOIUrl":null,"url":null,"abstract":"<p><p>The construction of small proteins by removing amino acid subsequences that are not involved in function, activity, or structure is crucial for bioprocessing and drug development. Traditional design methods often focus on reconstructing functional motifs, but they face challenges in stabilizing structure and reproducing function. In this study, we aimed to develop a design method for small proteins using ProtGPT2, a model that generates protein sequences based on function and structure. First, amino acid sequence data of malate dehydrogenase (MDH) was collected, and ProtGPT2 was fine-tuned (ProtGPT2 for MDH). The chain length and perplexity (ppl) of the generated sequences were evaluated, producing shorter sequences than the natural ones. The validity of the generated sequences was assessed using both population and individual analyses. Population analysis, including multiple sequence alignment (MSA) and t-distributed stochastic neighbor embedding (tSNE), revealed that ProtGPT2 for MDH identified functional motifs of MDH and incorporated them into the generated sequences. Additionally, tSNE showed that the generated sequences were highly similar to natural MDH sequences. In individual analysis, 10 randomly selected sequences were evaluated using BLAST, AlphaFold2, and InterPro. BLAST indicated that 9 sequences were novel MDH variants. AlphaFold2 confirmed that their 3D structures were highly similar to known MDH structures. InterPro identified domains and active sites in 2 sequences, suggesting that they were novel, small MDH variants. In conclusion, ProtGPT2 for MDH has the potential to design amino acid sequence candidates for small MDHs. The validity and utility of the model will be established through future experimental efforts.</p>","PeriodicalId":15199,"journal":{"name":"Journal of bioscience and bioengineering","volume":" ","pages":"174-179"},"PeriodicalIF":2.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"In silico design of smaller size enzymatic protein by generative artificial intelligence (ProtGPT2).\",\"authors\":\"Hiroyuki Hamada, Tamon Matsuzawa, Taizo Hanai\",\"doi\":\"10.1016/j.jbiosc.2025.06.009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The construction of small proteins by removing amino acid subsequences that are not involved in function, activity, or structure is crucial for bioprocessing and drug development. Traditional design methods often focus on reconstructing functional motifs, but they face challenges in stabilizing structure and reproducing function. In this study, we aimed to develop a design method for small proteins using ProtGPT2, a model that generates protein sequences based on function and structure. First, amino acid sequence data of malate dehydrogenase (MDH) was collected, and ProtGPT2 was fine-tuned (ProtGPT2 for MDH). The chain length and perplexity (ppl) of the generated sequences were evaluated, producing shorter sequences than the natural ones. The validity of the generated sequences was assessed using both population and individual analyses. Population analysis, including multiple sequence alignment (MSA) and t-distributed stochastic neighbor embedding (tSNE), revealed that ProtGPT2 for MDH identified functional motifs of MDH and incorporated them into the generated sequences. Additionally, tSNE showed that the generated sequences were highly similar to natural MDH sequences. In individual analysis, 10 randomly selected sequences were evaluated using BLAST, AlphaFold2, and InterPro. BLAST indicated that 9 sequences were novel MDH variants. AlphaFold2 confirmed that their 3D structures were highly similar to known MDH structures. InterPro identified domains and active sites in 2 sequences, suggesting that they were novel, small MDH variants. In conclusion, ProtGPT2 for MDH has the potential to design amino acid sequence candidates for small MDHs. The validity and utility of the model will be established through future experimental efforts.</p>\",\"PeriodicalId\":15199,\"journal\":{\"name\":\"Journal of bioscience and bioengineering\",\"volume\":\" \",\"pages\":\"174-179\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of bioscience and bioengineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jbiosc.2025.06.009\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/10 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q3\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of bioscience and bioengineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1016/j.jbiosc.2025.06.009","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/10 0:00:00","PubModel":"Epub","JCR":"Q3","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
摘要
通过去除与功能、活性或结构无关的氨基酸子序列来构建小蛋白对于生物加工和药物开发至关重要。传统的设计方法往往侧重于功能母题的重构,但在结构的稳定和功能的再现方面面临着挑战。在本研究中,我们旨在利用基于功能和结构生成蛋白质序列的模型ProtGPT2,开发一种小分子蛋白质的设计方法。首先收集苹果酸脱氢酶(MDH)的氨基酸序列数据,并对ProtGPT2进行微调(ProtGPT2 for MDH)。对所生成序列的链长和困惑度(ppl)进行了评价,得到的序列比天然序列短。使用群体和个体分析评估生成序列的有效性。种群分析包括多序列比对(MSA)和t分布随机邻居嵌入(tSNE),结果表明ProtGPT2能够识别MDH的功能基序,并将其整合到生成的序列中。此外,tSNE显示生成的序列与天然MDH序列高度相似。在个体分析中,随机选择10个序列,使用BLAST、AlphaFold2和InterPro进行评估。BLAST结果显示,其中9个序列为新的MDH变异。AlphaFold2证实它们的3D结构与已知的MDH结构高度相似。InterPro鉴定了2个序列的结构域和活性位点,表明它们是新的、小的MDH变体。总之,MDH的ProtGPT2具有设计小MDH候选氨基酸序列的潜力。该模型的有效性和实用性有待于进一步的实验验证。
In silico design of smaller size enzymatic protein by generative artificial intelligence (ProtGPT2).
The construction of small proteins by removing amino acid subsequences that are not involved in function, activity, or structure is crucial for bioprocessing and drug development. Traditional design methods often focus on reconstructing functional motifs, but they face challenges in stabilizing structure and reproducing function. In this study, we aimed to develop a design method for small proteins using ProtGPT2, a model that generates protein sequences based on function and structure. First, amino acid sequence data of malate dehydrogenase (MDH) was collected, and ProtGPT2 was fine-tuned (ProtGPT2 for MDH). The chain length and perplexity (ppl) of the generated sequences were evaluated, producing shorter sequences than the natural ones. The validity of the generated sequences was assessed using both population and individual analyses. Population analysis, including multiple sequence alignment (MSA) and t-distributed stochastic neighbor embedding (tSNE), revealed that ProtGPT2 for MDH identified functional motifs of MDH and incorporated them into the generated sequences. Additionally, tSNE showed that the generated sequences were highly similar to natural MDH sequences. In individual analysis, 10 randomly selected sequences were evaluated using BLAST, AlphaFold2, and InterPro. BLAST indicated that 9 sequences were novel MDH variants. AlphaFold2 confirmed that their 3D structures were highly similar to known MDH structures. InterPro identified domains and active sites in 2 sequences, suggesting that they were novel, small MDH variants. In conclusion, ProtGPT2 for MDH has the potential to design amino acid sequence candidates for small MDHs. The validity and utility of the model will be established through future experimental efforts.
期刊介绍:
The Journal of Bioscience and Bioengineering is a research journal publishing original full-length research papers, reviews, and Letters to the Editor. The Journal is devoted to the advancement and dissemination of knowledge concerning fermentation technology, biochemical engineering, food technology and microbiology.