Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff
{"title":"Active learning for improving out-of-distribution lab-in-the-loop experimental design","authors":"Daria Balashova , Robert Frank , Svetlana Kuzyakina , Dominique Weltevreden , Philippe A. Robert , Geir Kjetil Sandve , Victor Greiff","doi":"10.1016/j.immuno.2026.100065","DOIUrl":null,"url":null,"abstract":"<div><div>The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.</div></div>","PeriodicalId":73343,"journal":{"name":"Immunoinformatics (Amsterdam, Netherlands)","volume":"21 ","pages":"Article 100065"},"PeriodicalIF":0.0000,"publicationDate":"2026-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Immunoinformatics (Amsterdam, Netherlands)","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2667119026000017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/1/21 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
The accurate prediction of antibody-antigen binding is crucial for developing antibody-based therapeutics and advancing immunological research. Library-on-library approaches, where many antigens are probed against many antibodies, can identify specific interacting pairs. Machine learning models can predict target binding by analyzing many-to-many relationships between antibodies and antigens. However, these models face challenges when predicting interactions when test antibodies and antigens are not represented in the training data, a scenario known as out-of-distribution prediction. Generating experimental binding data is costly, limiting the availability of comprehensive datasets. Active learning can reduce costs by starting with a small labeled subset of data and iteratively expanding the labeled dataset. Few active learning approaches are available to handle data with many-to-many relationships as, for example, obtained from library-on-library screening approaches. In this study, we adapted twelve active learning strategies for antibody-antigen binding prediction in a library-on-library setting and evaluated their out-of-distribution performance using the Absolut! simulation framework. We found that three of the twelve algorithms tested, modestly but significantly, outperformed the baseline where random data are iteratively labeled. The best algorithm reduced the number of required antigen mutant variants by up to 12.5% compared to the random baseline. These findings demonstrate that active learning can improve experimental efficiency in a library-on-library setting and advance antibody-antigen binding prediction.