Xiongwen Li, Zhu Liang, Zhetao Guo, Ziyi Liu, Ke Wu, Jiahao Luo, Yuesheng Zhang, Lizheng Liu, Manda Sun, Yuanyuan Huang, Hongting Tang, Yu Chen, Tao Yu, Jens Nielsen, Feiran Li
{"title":"利用大型语言模型进行代谢工程设计","authors":"Xiongwen Li, Zhu Liang, Zhetao Guo, Ziyi Liu, Ke Wu, Jiahao Luo, Yuesheng Zhang, Lizheng Liu, Manda Sun, Yuanyuan Huang, Hongting Tang, Yu Chen, Tao Yu, Jens Nielsen, Feiran Li","doi":"10.1101/2024.09.09.612023","DOIUrl":null,"url":null,"abstract":"Establishing efficient cell factories involves a continuous process of trial and error due to the intricate nature of metabolism. This complexity makes predicting effective engineering targets a challenging task. Therefore, it is vital to learn from the accumulated successes of previous designs for advancing future cell factory development. In this study, we developed a method based on large language models (LLMs) to extract metabolic engineering strategies from research articles on a large scale. We created a database containing over 29006 metabolic engineering entries, 1210 products and 751 organisms. Using this extracted data, we trained a hybrid model combining deep learning and mechanistic approaches to predict engineering targets. Our model outperformed traditional metabolic engineering target prediction algorithms, excelled in predicting the effects of gene modifications, and generalized well to out-of-distribution products and multiple gene combinations. Our study provides a valuable dataset, a chatbot, and an engineering target prediction model for the metabolic engineering field and exemplifies an efficient method for leveraging existing knowledge for future predictions.","PeriodicalId":501213,"journal":{"name":"bioRxiv - Systems Biology","volume":"17 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging large language models for metabolic engineering design\",\"authors\":\"Xiongwen Li, Zhu Liang, Zhetao Guo, Ziyi Liu, Ke Wu, Jiahao Luo, Yuesheng Zhang, Lizheng Liu, Manda Sun, Yuanyuan Huang, Hongting Tang, Yu Chen, Tao Yu, Jens Nielsen, Feiran Li\",\"doi\":\"10.1101/2024.09.09.612023\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Establishing efficient cell factories involves a continuous process of trial and error due to the intricate nature of metabolism. This complexity makes predicting effective engineering targets a challenging task. Therefore, it is vital to learn from the accumulated successes of previous designs for advancing future cell factory development. In this study, we developed a method based on large language models (LLMs) to extract metabolic engineering strategies from research articles on a large scale. We created a database containing over 29006 metabolic engineering entries, 1210 products and 751 organisms. Using this extracted data, we trained a hybrid model combining deep learning and mechanistic approaches to predict engineering targets. Our model outperformed traditional metabolic engineering target prediction algorithms, excelled in predicting the effects of gene modifications, and generalized well to out-of-distribution products and multiple gene combinations. Our study provides a valuable dataset, a chatbot, and an engineering target prediction model for the metabolic engineering field and exemplifies an efficient method for leveraging existing knowledge for future predictions.\",\"PeriodicalId\":501213,\"journal\":{\"name\":\"bioRxiv - Systems Biology\",\"volume\":\"17 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Systems Biology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.09.09.612023\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.09.09.612023","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Leveraging large language models for metabolic engineering design
Establishing efficient cell factories involves a continuous process of trial and error due to the intricate nature of metabolism. This complexity makes predicting effective engineering targets a challenging task. Therefore, it is vital to learn from the accumulated successes of previous designs for advancing future cell factory development. In this study, we developed a method based on large language models (LLMs) to extract metabolic engineering strategies from research articles on a large scale. We created a database containing over 29006 metabolic engineering entries, 1210 products and 751 organisms. Using this extracted data, we trained a hybrid model combining deep learning and mechanistic approaches to predict engineering targets. Our model outperformed traditional metabolic engineering target prediction algorithms, excelled in predicting the effects of gene modifications, and generalized well to out-of-distribution products and multiple gene combinations. Our study provides a valuable dataset, a chatbot, and an engineering target prediction model for the metabolic engineering field and exemplifies an efficient method for leveraging existing knowledge for future predictions.