{"title":"GENERATION OF NOVEL ANTIBODY CANDIDATES USING TRANSFORMER AND GAN-BASED DEEP LEARNING ARTIFICIAL INTELLIGENCE","authors":"Hongyu Zhang, Xiao-De Lyu, Qi-An Zhao, Bo Liu","doi":"10.1093/abt/tbad014.014","DOIUrl":null,"url":null,"abstract":"Abstract Introduction Conventional library-based antibody display can only explore a small fraction of the sequences generated from animal immunization, not even to exhaust the potential sequence diversity that can be turned into antibody therapies. This is because screening for antibody is limited to sequences that can be displayed, which only constitute a subset of the entire sequences generated by B cells, whereas screening for antibody directly from single B cells can be costly. Here, we introduce a novel Artificial Intelligence-enabling tool to navigate antibody discovery from a broader range of search space with reduced cost. We trained a transformer-based model from sequences of an immunized library to cluster the clones and a generative adversarial network (GAN)-based model to generate novel sequences that can be potentially developed into antibody therapies. Background and significance One limitation in the early discovery of antibody is the number of functional candidates that can be selected. Our work provides an AI-enabling tool to discover and generate a panel of antibodies of differentiated binding strengths to a broad range of epitopes to ensure functional coverage. Methods & Results We extracted 104 sequences from the FACS-enriched yeast pool from a fully immunized alpaca (Lama pacos) using Next Generation Sequencing, from which we assembled 103 unique sdAb sequences. We fine-tuned a transformer-based deep learning model, which was previously trained from our dataset containing 100,000 antibody sequences, on such pre-processed sdAb sequences giving representation that correlates to the sequence homology for the clustering of clonal types. We postulate such representation also encodes long-range amino acid interactions in the 3D structure, making the accuracy exceeds the performance of bioinformatics-based primary sequence homology analysis. This process is fully automated and optimized to require minima computational resources. We selected 15 candidates from AI-clustered clonal groups and experimentally measured their binding activity. Kd of 12 candidates were of 10−9 affinity and 1 candidates were of 10−8 affinity, the rest one candidate was non-binding (hence a hit rate of 87%). The large sequence diversity of the CDR3 show these nanobodies are potentially good binders for a wide range of epitopes. We generated a CDR-diversifying virtual library (103) of each binding candidate by training a GAN-based models using the sequences of the same clonal group of the binder sequences. This method incorporates the probability of amino acid residues on each specific location that provides a more precise mutagenesis route than PCR-based affinity maturation. The generated sequences provided a wider CDR sequence diversity for the selection of antibodies of differentiated affinity and epitopes, which could generate candidates of different functionality. Conclusion Antibody discovery is a central step in early drug development that identification of a wide range of functional candidates could increase the success rate and reduce risks in later developments. We built an AI-enabling tool for the searching and generation of functional antibodies from animal immunization library. We believe this technology would help deliver candidates of fine-tuned affinity and functionality.","PeriodicalId":36655,"journal":{"name":"Antibody Therapeutics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Antibody Therapeutics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/abt/tbad014.014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0
Abstract
Abstract Introduction Conventional library-based antibody display can only explore a small fraction of the sequences generated from animal immunization, not even to exhaust the potential sequence diversity that can be turned into antibody therapies. This is because screening for antibody is limited to sequences that can be displayed, which only constitute a subset of the entire sequences generated by B cells, whereas screening for antibody directly from single B cells can be costly. Here, we introduce a novel Artificial Intelligence-enabling tool to navigate antibody discovery from a broader range of search space with reduced cost. We trained a transformer-based model from sequences of an immunized library to cluster the clones and a generative adversarial network (GAN)-based model to generate novel sequences that can be potentially developed into antibody therapies. Background and significance One limitation in the early discovery of antibody is the number of functional candidates that can be selected. Our work provides an AI-enabling tool to discover and generate a panel of antibodies of differentiated binding strengths to a broad range of epitopes to ensure functional coverage. Methods & Results We extracted 104 sequences from the FACS-enriched yeast pool from a fully immunized alpaca (Lama pacos) using Next Generation Sequencing, from which we assembled 103 unique sdAb sequences. We fine-tuned a transformer-based deep learning model, which was previously trained from our dataset containing 100,000 antibody sequences, on such pre-processed sdAb sequences giving representation that correlates to the sequence homology for the clustering of clonal types. We postulate such representation also encodes long-range amino acid interactions in the 3D structure, making the accuracy exceeds the performance of bioinformatics-based primary sequence homology analysis. This process is fully automated and optimized to require minima computational resources. We selected 15 candidates from AI-clustered clonal groups and experimentally measured their binding activity. Kd of 12 candidates were of 10−9 affinity and 1 candidates were of 10−8 affinity, the rest one candidate was non-binding (hence a hit rate of 87%). The large sequence diversity of the CDR3 show these nanobodies are potentially good binders for a wide range of epitopes. We generated a CDR-diversifying virtual library (103) of each binding candidate by training a GAN-based models using the sequences of the same clonal group of the binder sequences. This method incorporates the probability of amino acid residues on each specific location that provides a more precise mutagenesis route than PCR-based affinity maturation. The generated sequences provided a wider CDR sequence diversity for the selection of antibodies of differentiated affinity and epitopes, which could generate candidates of different functionality. Conclusion Antibody discovery is a central step in early drug development that identification of a wide range of functional candidates could increase the success rate and reduce risks in later developments. We built an AI-enabling tool for the searching and generation of functional antibodies from animal immunization library. We believe this technology would help deliver candidates of fine-tuned affinity and functionality.