Active learning for improving out-of-distribution lab-in-the-loop experimental design

Immunoinformatics (Amsterdam, Netherlands) Pub Date : 2026-03-01 Epub Date: 2026-01-21 DOI:10.1016/j.immuno.2026.100065

Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff

{"title":"Active learning for improving out-of-distribution lab-in-the-loop experimental design","authors":"Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff","doi":"10.1016/j.immuno.2026.100065","DOIUrl":null,"url":null,"abstract":"<div><div>The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"21 ","pages":"Article 100065"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119026000017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/21 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.

Abstract Image

查看原文本刊更多论文

主动学习改进非分布实验室在环实验设计

准确预测抗体-抗原结合对于开发基于抗体的治疗方法和推进免疫学研究至关重要。文库对文库的方法，其中许多抗原针对许多抗体进行探测，可以识别特定的相互作用对。机器学习模型可以通过分析抗体和抗原之间的多对多关系来预测目标结合。然而，当测试抗体和抗原没有在训练数据中表示时，这些模型在预测相互作用时面临挑战，这种情况被称为分布外预测。生成实验绑定数据是昂贵的，限制了综合数据集的可用性。主动学习可以通过从一个小的标记数据子集开始，迭代地扩展标记数据集来降低成本。很少有主动学习方法可用于处理具有多对多关系的数据，例如，从图书馆对图书馆的筛选方法中获得的数据。在这项研究中，我们采用了12种主动学习策略来预测库间的抗体-抗原结合，并使用Absolut！仿真框架。我们发现，在测试的12种算法中，有3种算法的表现，适度但显著地优于随机数据迭代标记的基线。与随机基线相比，最佳算法将所需抗原突变变体的数量减少了12.5%。这些发现表明，主动学习可以提高库对库环境下的实验效率，并推进抗体-抗原结合预测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Immunoinformatics (Amsterdam, Netherlands) Immunology, Computer Science Applications

自引率

0.00%

发文量

审稿时长

60 days