ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning†

IF 6.2 Q1 CHEMISTRY, MULTIDISCIPLINARY

Digital discovery Pub Date : 2024-05-17 DOI:10.1039/D4DD00013G

Alireza Ghafarollahi and Markus J. Buehler

{"title":"ProtAgents: protein discovery via large language model multi-agent collaborations combining physics and machine learning†","authors":"Alireza Ghafarollahi and Markus J. Buehler","doi":"10.1039/D4DD00013G","DOIUrl":null,"url":null,"abstract":"Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data – natural vibrational frequencies – via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.","PeriodicalId":72816,"journal":{"name":"Digital discovery","volume":" 7","pages":" 1389-1409"},"PeriodicalIF":6.2000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/dd/d4dd00013g?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital discovery","FirstCategoryId":"1085","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/dd/d4dd00013g","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Designing de novo proteins beyond those found in nature holds significant promise for advancements in both scientific and engineering applications. Current methodologies for protein design often rely on AI-based models, such as surrogate models that address end-to-end problems by linking protein structure to material properties or vice versa. However, these models frequently focus on specific material objectives or structural properties, limiting their flexibility when incorporating out-of-domain knowledge into the design process or comprehensive data analysis is required. In this study, we introduce ProtAgents, a platform for de novo protein design based on Large Language Models (LLMs), where multiple AI agents with distinct capabilities collaboratively address complex tasks within a dynamic environment. The versatility in agent development allows for expertise in diverse domains, including knowledge retrieval, protein structure analysis, physics-based simulations, and results analysis. The dynamic collaboration between agents, empowered by LLMs, provides a versatile approach to tackling protein design and analysis problems, as demonstrated through diverse examples in this study. The problems of interest encompass designing new proteins, analyzing protein structures and obtaining new first-principles data – natural vibrational frequencies – via physics simulations. The concerted effort of the system allows for powerful automated and synergistic design of de novo proteins with targeted mechanical properties. The flexibility in designing the agents, on one hand, and their capacity in autonomous collaboration through the dynamic LLM-based multi-agent environment on the other hand, unleashes great potentials of LLMs in addressing multi-objective materials problems and opens up new avenues for autonomous materials discovery and design.

Abstract Image

查看原文本刊更多论文

ProtAgents：通过结合物理学和机器学习的大型语言模型多代理协作发现蛋白质

设计超越自然界的蛋白质为科学和工程应用领域的进步带来了巨大的希望。目前的蛋白质设计方法通常依赖于基于人工智能的模型，例如通过将蛋白质结构与材料属性或反向连接来解决端到端问题的代用模型。然而，这些模型通常只关注特定的材料目标或结构特性，在将领域外知识纳入设计过程或需要综合数据分析时，其灵活性受到限制。在这项研究中，我们介绍了基于大型语言模型（LLMs）的蛋白质设计平台--ProtAgents，在这个平台上，具有不同能力的多个人工智能代理可以在动态环境中协同完成复杂的任务。代理开发的多功能性使其具备了不同领域的专业知识，包括知识检索、蛋白质结构分析、物理模拟和结果分析。正如本研究中的各种示例所证明的那样，由 LLMs 驱动的代理之间的动态协作为解决蛋白质设计和分析问题提供了一种多用途方法。我们感兴趣的问题包括设计新蛋白质、分析蛋白质结构以及通过物理模拟获得新的第一原理数据--自然振动频率。通过该系统的协同努力，可以自动协同设计出具有目标机械特性的蛋白质。通过基于LLM的动态多代理环境，一方面可以灵活设计代理，另一方面可以实现代理间的自主协作，从而释放LLM在解决多目标材料问题方面的巨大潜力，为自主材料发现和设计开辟了新途径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital discovery

CiteScore

2.80

自引率

0.00%

发文量