利用大型语言模型预测抗甲型流感血凝素抗体的生物活性。

IF 4.4 2区生物学 Q2 BIOCHEMISTRY & MOLECULAR BIOLOGY

Computational and structural biotechnology journal Pub Date : 2025-03-24 eCollection Date: 2025-01-01 DOI:10.1016/j.csbj.2025.03.038

Ella Barkan, Ibrahim Siddiqui, Kevin J Cheng, Alex Golts, Yoel Shoshan, Jeffrey K Weber, Yailin Campos Mota, Michal Ozery-Flato, Giuseppe A Sautto

{"title":"利用大型语言模型预测抗甲型流感血凝素抗体的生物活性。","authors":"Ella Barkan, Ibrahim Siddiqui, Kevin J Cheng, Alex Golts, Yoel Shoshan, Jeffrey K Weber, Yailin Campos Mota, Michal Ozery-Flato, Giuseppe A Sautto","doi":"10.1016/j.csbj.2025.03.038","DOIUrl":null,"url":null,"abstract":"Monoclonal antibodies (mAbs) represent one of the most prevalent FDA-approved treatments for autoimmune, infectious, and cancer diseases. However, their discovery and development remains a time-consuming and costly process. Recent advancements in machine learning (ML) and artificial intelligence (AI) have shown significant promise in revolutionizing antibody discovery field. Models that predict antibody biological activity enable in silico evaluation of binding and functional properties; such models can prioritize antibodies with the highest likelihood of success in laboratory testing procedures. We explore an AI model for predicting the binding and receptor blocking activity of antibodies against influenza A hemagglutinin (HA) antigens. Our model is developed with the Molecular Aligned Multi-Modal Architecture and Language (MAMMAL) framework for biologics discovery to predict antibody-antigen interactions using only sequence information. To evaluate the model's performance, we tested it under various data split conditions to mimic real-world scenarios. Our model achieved an area under the receiver operating characteristic (AUROC) score of ≥ 0.91 for predicting the activity of existing antibodies against seen HAs and an AUROC score of 0.9 for unseen HAs. For novel antibody activity prediction, the AUROC was 0.73, which further declined to 0.63-0.66 under stringent constraints on similarity to existing antibodies. These results demonstrate the potential of AI foundation models to transform antibody design by reducing dependence on extensive laboratory testing and enabling more efficient prioritization of antibody candidates. Moreover, our findings emphasize the critical importance of diverse and comprehensive antibody datasets to improve the generalization of prediction models, particularly for novel antibody development.","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"1286-1295"},"PeriodicalIF":4.4000,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11995015/pdf/","citationCount":"0","resultStr":"{\"title\":\"Leveraging large language models to predict antibody biological activity against influenza A hemagglutinin.\",\"authors\":\"Ella Barkan, Ibrahim Siddiqui, Kevin J Cheng, Alex Golts, Yoel Shoshan, Jeffrey K Weber, Yailin Campos Mota, Michal Ozery-Flato, Giuseppe A Sautto\",\"doi\":\"10.1016/j.csbj.2025.03.038\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monoclonal antibodies (mAbs) represent one of the most prevalent FDA-approved treatments for autoimmune, infectious, and cancer diseases. However, their discovery and development remains a time-consuming and costly process. Recent advancements in machine learning (ML) and artificial intelligence (AI) have shown significant promise in revolutionizing antibody discovery field. Models that predict antibody biological activity enable in silico evaluation of binding and functional properties; such models can prioritize antibodies with the highest likelihood of success in laboratory testing procedures. We explore an AI model for predicting the binding and receptor blocking activity of antibodies against influenza A hemagglutinin (HA) antigens. Our model is developed with the Molecular Aligned Multi-Modal Architecture and Language (MAMMAL) framework for biologics discovery to predict antibody-antigen interactions using only sequence information. To evaluate the model's performance, we tested it under various data split conditions to mimic real-world scenarios. Our model achieved an area under the receiver operating characteristic (AUROC) score of ≥ 0.91 for predicting the activity of existing antibodies against seen HAs and an AUROC score of 0.9 for unseen HAs. For novel antibody activity prediction, the AUROC was 0.73, which further declined to 0.63-0.66 under stringent constraints on similarity to existing antibodies. These results demonstrate the potential of AI foundation models to transform antibody design by reducing dependence on extensive laboratory testing and enabling more efficient prioritization of antibody candidates. Moreover, our findings emphasize the critical importance of diverse and comprehensive antibody datasets to improve the generalization of prediction models, particularly for novel antibody development.\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"1286-1295\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-03-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11995015/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.03.038\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.03.038","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

单克隆抗体（mab）是fda批准的治疗自身免疫性、传染性和癌症疾病的最普遍的方法之一。然而，它们的发现和开发仍然是一个耗时且昂贵的过程。机器学习（ML）和人工智能（AI）的最新进展显示出革命性抗体发现领域的重大希望。预测抗体生物活性的模型能够在计算机上评估结合和功能特性；这样的模型可以优先考虑在实验室测试程序中成功率最高的抗体。我们探索了一种AI模型，用于预测甲型流感血凝素（HA）抗原抗体的结合和受体阻断活性。我们的模型是用分子对齐多模态结构和语言（哺乳动物）框架开发的，用于生物制剂发现，仅使用序列信息来预测抗体-抗原相互作用。为了评估模型的性能，我们在各种数据分割条件下对其进行了测试，以模拟现实世界的场景。我们的模型在预测现有抗体对已知HAs的活性方面，获得了受试者操作特征（AUROC）评分≥ 0.91的区域，在预测未见HAs的活性方面获得了0.9的AUROC评分。对于新抗体活性预测，AUROC为0.73，在与现有抗体相似度的严格限制下，AUROC进一步下降至0.63-0.66。这些结果证明了人工智能基础模型的潜力，通过减少对大量实验室测试的依赖，并使候选抗体的优先级更有效，从而改变抗体设计。此外，我们的研究结果强调了多样化和全面的抗体数据集对于提高预测模型的泛化的重要性，特别是对于新型抗体的开发。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Leveraging large language models to predict antibody biological activity against influenza A hemagglutinin.

Monoclonal antibodies (mAbs) represent one of the most prevalent FDA-approved treatments for autoimmune, infectious, and cancer diseases. However, their discovery and development remains a time-consuming and costly process. Recent advancements in machine learning (ML) and artificial intelligence (AI) have shown significant promise in revolutionizing antibody discovery field. Models that predict antibody biological activity enable in silico evaluation of binding and functional properties; such models can prioritize antibodies with the highest likelihood of success in laboratory testing procedures. We explore an AI model for predicting the binding and receptor blocking activity of antibodies against influenza A hemagglutinin (HA) antigens. Our model is developed with the Molecular Aligned Multi-Modal Architecture and Language (MAMMAL) framework for biologics discovery to predict antibody-antigen interactions using only sequence information. To evaluate the model's performance, we tested it under various data split conditions to mimic real-world scenarios. Our model achieved an area under the receiver operating characteristic (AUROC) score of ≥ 0.91 for predicting the activity of existing antibodies against seen HAs and an AUROC score of 0.9 for unseen HAs. For novel antibody activity prediction, the AUROC was 0.73, which further declined to 0.63-0.66 under stringent constraints on similarity to existing antibodies. These results demonstrate the potential of AI foundation models to transform antibody design by reducing dependence on extensive laboratory testing and enabling more efficient prioritization of antibody candidates. Moreover, our findings emphasize the critical importance of diverse and comprehensive antibody datasets to improve the generalization of prediction models, particularly for novel antibody development.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computational and structural biotechnology journal Biochemistry, Genetics and Molecular Biology-Biophysics

CiteScore

9.30

自引率

3.30%

发文量

540

审稿时长

6 weeks

期刊介绍： Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to: Structure and function of proteins, nucleic acids and other macromolecules Structure and function of multi-component complexes Protein folding, processing and degradation Enzymology Computational and structural studies of plant systems Microbial Informatics Genomics Proteomics Metabolomics Algorithms and Hypothesis in Bioinformatics Mathematical and Theoretical Biology Computational Chemistry and Drug Discovery Microscopy and Molecular Imaging Nanotechnology Systems and Synthetic Biology