GENERATION OF NOVEL ANTIBODY CANDIDATES USING TRANSFORMER AND GAN-BASED DEEP LEARNING ARTIFICIAL INTELLIGENCE

Q2 Medicine
Hongyu Zhang, Xiao-De Lyu, Qi-An Zhao, Bo Liu
{"title":"GENERATION OF NOVEL ANTIBODY CANDIDATES USING TRANSFORMER AND GAN-BASED DEEP LEARNING ARTIFICIAL INTELLIGENCE","authors":"Hongyu Zhang, Xiao-De Lyu, Qi-An Zhao, Bo Liu","doi":"10.1093/abt/tbad014.014","DOIUrl":null,"url":null,"abstract":"Abstract Introduction Conventional library-based antibody display can only explore a small fraction of the sequences generated from animal immunization, not even to exhaust the potential sequence diversity that can be turned into antibody therapies. This is because screening for antibody is limited to sequences that can be displayed, which only constitute a subset of the entire sequences generated by B cells, whereas screening for antibody directly from single B cells can be costly. Here, we introduce a novel Artificial Intelligence-enabling tool to navigate antibody discovery from a broader range of search space with reduced cost. We trained a transformer-based model from sequences of an immunized library to cluster the clones and a generative adversarial network (GAN)-based model to generate novel sequences that can be potentially developed into antibody therapies. Background and significance One limitation in the early discovery of antibody is the number of functional candidates that can be selected. Our work provides an AI-enabling tool to discover and generate a panel of antibodies of differentiated binding strengths to a broad range of epitopes to ensure functional coverage. Methods & Results We extracted 104 sequences from the FACS-enriched yeast pool from a fully immunized alpaca (Lama pacos) using Next Generation Sequencing, from which we assembled 103 unique sdAb sequences. We fine-tuned a transformer-based deep learning model, which was previously trained from our dataset containing 100,000 antibody sequences, on such pre-processed sdAb sequences giving representation that correlates to the sequence homology for the clustering of clonal types. We postulate such representation also encodes long-range amino acid interactions in the 3D structure, making the accuracy exceeds the performance of bioinformatics-based primary sequence homology analysis. This process is fully automated and optimized to require minima computational resources. We selected 15 candidates from AI-clustered clonal groups and experimentally measured their binding activity. Kd of 12 candidates were of 10−9 affinity and 1 candidates were of 10−8 affinity, the rest one candidate was non-binding (hence a hit rate of 87%). The large sequence diversity of the CDR3 show these nanobodies are potentially good binders for a wide range of epitopes. We generated a CDR-diversifying virtual library (103) of each binding candidate by training a GAN-based models using the sequences of the same clonal group of the binder sequences. This method incorporates the probability of amino acid residues on each specific location that provides a more precise mutagenesis route than PCR-based affinity maturation. The generated sequences provided a wider CDR sequence diversity for the selection of antibodies of differentiated affinity and epitopes, which could generate candidates of different functionality. Conclusion Antibody discovery is a central step in early drug development that identification of a wide range of functional candidates could increase the success rate and reduce risks in later developments. We built an AI-enabling tool for the searching and generation of functional antibodies from animal immunization library. We believe this technology would help deliver candidates of fine-tuned affinity and functionality.","PeriodicalId":36655,"journal":{"name":"Antibody Therapeutics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Antibody Therapeutics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/abt/tbad014.014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

Abstract

Abstract Introduction Conventional library-based antibody display can only explore a small fraction of the sequences generated from animal immunization, not even to exhaust the potential sequence diversity that can be turned into antibody therapies. This is because screening for antibody is limited to sequences that can be displayed, which only constitute a subset of the entire sequences generated by B cells, whereas screening for antibody directly from single B cells can be costly. Here, we introduce a novel Artificial Intelligence-enabling tool to navigate antibody discovery from a broader range of search space with reduced cost. We trained a transformer-based model from sequences of an immunized library to cluster the clones and a generative adversarial network (GAN)-based model to generate novel sequences that can be potentially developed into antibody therapies. Background and significance One limitation in the early discovery of antibody is the number of functional candidates that can be selected. Our work provides an AI-enabling tool to discover and generate a panel of antibodies of differentiated binding strengths to a broad range of epitopes to ensure functional coverage. Methods & Results We extracted 104 sequences from the FACS-enriched yeast pool from a fully immunized alpaca (Lama pacos) using Next Generation Sequencing, from which we assembled 103 unique sdAb sequences. We fine-tuned a transformer-based deep learning model, which was previously trained from our dataset containing 100,000 antibody sequences, on such pre-processed sdAb sequences giving representation that correlates to the sequence homology for the clustering of clonal types. We postulate such representation also encodes long-range amino acid interactions in the 3D structure, making the accuracy exceeds the performance of bioinformatics-based primary sequence homology analysis. This process is fully automated and optimized to require minima computational resources. We selected 15 candidates from AI-clustered clonal groups and experimentally measured their binding activity. Kd of 12 candidates were of 10−9 affinity and 1 candidates were of 10−8 affinity, the rest one candidate was non-binding (hence a hit rate of 87%). The large sequence diversity of the CDR3 show these nanobodies are potentially good binders for a wide range of epitopes. We generated a CDR-diversifying virtual library (103) of each binding candidate by training a GAN-based models using the sequences of the same clonal group of the binder sequences. This method incorporates the probability of amino acid residues on each specific location that provides a more precise mutagenesis route than PCR-based affinity maturation. The generated sequences provided a wider CDR sequence diversity for the selection of antibodies of differentiated affinity and epitopes, which could generate candidates of different functionality. Conclusion Antibody discovery is a central step in early drug development that identification of a wide range of functional candidates could increase the success rate and reduce risks in later developments. We built an AI-enabling tool for the searching and generation of functional antibodies from animal immunization library. We believe this technology would help deliver candidates of fine-tuned affinity and functionality.
利用变压器和基于gan的深度学习人工智能生成新的候选抗体
传统的基于文库的抗体展示只能探索动物免疫产生的一小部分序列,甚至不能穷尽可转化为抗体治疗的潜在序列多样性。这是因为抗体的筛选仅限于可以显示的序列,这些序列仅构成B细胞产生的整个序列的一个子集,而直接从单个B细胞中筛选抗体可能是昂贵的。在这里,我们介绍了一种新的人工智能支持工具,以更低的成本从更广泛的搜索空间中导航抗体发现。我们从免疫文库的序列中训练了一个基于转换器的模型来聚类克隆,并训练了一个基于生成对抗网络(GAN)的模型来生成新的序列,这些序列可以潜在地开发成抗体疗法。背景和意义抗体早期发现的一个限制是可选择的功能性候选物的数量。我们的工作提供了一种支持人工智能的工具,可以发现和生成一组与广泛的表位具有差异化结合强度的抗体,以确保功能覆盖。方法与结果利用Next Generation Sequencing从完全免疫羊驼(Lama pacos)的facs富集酵母池中提取104个序列,并从中组装出103个独特的sdAb序列。我们对基于转换器的深度学习模型进行了微调,该模型先前从包含100,000个抗体序列的数据集中进行了训练,这些预处理的sdAb序列给出了与克隆类型聚类的序列同源性相关的表示。我们假设这种表示也编码了三维结构中的远程氨基酸相互作用,使得精度超过了基于生物信息学的初级序列同源性分析的性能。这个过程是完全自动化和优化,需要最小的计算资源。我们从ai集群克隆群中选择了15个候选克隆群,实验测量了它们的结合活性。12个候选基因的Kd亲和度为10−9,1个候选基因亲和度为10−8,其余1个候选基因为非结合型(因此命中率为87%)。CDR3的大序列多样性表明这些纳米体是广泛的表位的潜在良好结合物。我们通过使用结合物序列的同一克隆群序列训练基于gan的模型,生成了每个候选结合物的cdr多样化虚拟文库(103)。该方法结合了每个特定位置上氨基酸残基的概率,提供了比基于pcr的亲和成熟更精确的诱变途径。生成的序列为选择具有不同亲和力和表位的抗体提供了更广泛的CDR序列多样性,从而可以产生不同功能的候选物。结论抗体的发现是药物早期开发的核心步骤,广泛的功能候选物的识别可以提高药物开发的成功率,降低后期开发的风险。我们构建了一个人工智能工具,用于从动物免疫库中搜索和生成功能抗体。我们相信这项技术将有助于交付经过微调的亲和力和功能的候选产品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Antibody Therapeutics
Antibody Therapeutics Medicine-Immunology and Allergy
CiteScore
8.70
自引率
0.00%
发文量
30
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信