通过机器学习研究抗原结合片段结晶的结构生物物理特征

IF 3.2 3区 工程技术 Q2 CHEMISTRY, PHYSICAL
Krishna Gopal Chattaraj, Joana Ferreira, Allan S. Myerson and Bernhardt L. Trout
{"title":"通过机器学习研究抗原结合片段结晶的结构生物物理特征","authors":"Krishna Gopal Chattaraj, Joana Ferreira, Allan S. Myerson and Bernhardt L. Trout","doi":"10.1039/D4ME00187G","DOIUrl":null,"url":null,"abstract":"<p >Antibody-based therapeutics continue to be an important pharmaceutical development modality. Crystallization of antibodies is important for structural characterization, but in addition has the potential for use as a separation method and for use as a dosage form. Nevertheless, bringing about controlled crystallization of an antibody remains a challenging task due to its large size, high degree of segmental flexibility, and the intricacy of all the occurring interactions (<em>e.g.</em>, protein–protein interactions, protein–solvent interactions, <em>etc.</em>). Methods to predict important contact sites could help to develop such crystallization methods. However, limited data and understanding have hitherto not allowed the development of such robust methods. This study employs machine learning combined with <em>in silico</em> modelling of crystal structures using available experimental structures to identify the crucial physicochemical features necessary for successful antibody crystallization in an attempt to remedy that gap. The developed method can with good accuracy distinguish crystal-site residues from non-crystal-site residues. A set of 510 descriptors is utilized to characterize each residue, which is treated as a distinct data point. Moreover, new algorithms have been developed to design novel descriptors that improve the model's predictive capabilities. Fragment antigen-binding (Fab) regions are investigated due to the scarcity of full-length monoclonal antibodies (mAbs) crystal structures. The current findings show that the extreme gradient boosting (XGBoost) algorithm effectively identifies crystal site residues, as evidenced by an AUPRC value that is more than 3-fold higher than that of the baseline model. The top-ranked descriptors indicate that crystal-site residues are primarily characterized by solvent-exposed residues with high spatial aggregation propensity (SAP), signifying hydrophobic patches, and their immediate surface-exposed neighbors. Moreover, these high SAP residues are often surrounded by other solvent-exposed residues that are either polar, charged, or both. In contrast, residues not involved in crystal interfaces generally lack these essential features, though some might be excluded due to specific crystal lattice arrangements. Additionally, reducing the feature set from 510 to the top 15% in the XGBoost model yields similar performance while significantly simplifying the model.</p>","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":" 5","pages":" 377-393"},"PeriodicalIF":3.2000,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/me/d4me00187g?page=search","citationCount":"0","resultStr":"{\"title\":\"Investigating structural biophysical features for antigen-binding fragment crystallization via machine learning†\",\"authors\":\"Krishna Gopal Chattaraj, Joana Ferreira, Allan S. Myerson and Bernhardt L. Trout\",\"doi\":\"10.1039/D4ME00187G\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Antibody-based therapeutics continue to be an important pharmaceutical development modality. Crystallization of antibodies is important for structural characterization, but in addition has the potential for use as a separation method and for use as a dosage form. Nevertheless, bringing about controlled crystallization of an antibody remains a challenging task due to its large size, high degree of segmental flexibility, and the intricacy of all the occurring interactions (<em>e.g.</em>, protein–protein interactions, protein–solvent interactions, <em>etc.</em>). Methods to predict important contact sites could help to develop such crystallization methods. However, limited data and understanding have hitherto not allowed the development of such robust methods. This study employs machine learning combined with <em>in silico</em> modelling of crystal structures using available experimental structures to identify the crucial physicochemical features necessary for successful antibody crystallization in an attempt to remedy that gap. The developed method can with good accuracy distinguish crystal-site residues from non-crystal-site residues. A set of 510 descriptors is utilized to characterize each residue, which is treated as a distinct data point. Moreover, new algorithms have been developed to design novel descriptors that improve the model's predictive capabilities. Fragment antigen-binding (Fab) regions are investigated due to the scarcity of full-length monoclonal antibodies (mAbs) crystal structures. The current findings show that the extreme gradient boosting (XGBoost) algorithm effectively identifies crystal site residues, as evidenced by an AUPRC value that is more than 3-fold higher than that of the baseline model. The top-ranked descriptors indicate that crystal-site residues are primarily characterized by solvent-exposed residues with high spatial aggregation propensity (SAP), signifying hydrophobic patches, and their immediate surface-exposed neighbors. Moreover, these high SAP residues are often surrounded by other solvent-exposed residues that are either polar, charged, or both. In contrast, residues not involved in crystal interfaces generally lack these essential features, though some might be excluded due to specific crystal lattice arrangements. Additionally, reducing the feature set from 510 to the top 15% in the XGBoost model yields similar performance while significantly simplifying the model.</p>\",\"PeriodicalId\":91,\"journal\":{\"name\":\"Molecular Systems Design & Engineering\",\"volume\":\" 5\",\"pages\":\" 377-393\"},\"PeriodicalIF\":3.2000,\"publicationDate\":\"2025-03-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://pubs.rsc.org/en/content/articlepdf/2025/me/d4me00187g?page=search\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Molecular Systems Design & Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://pubs.rsc.org/en/content/articlelanding/2025/me/d4me00187g\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/me/d4me00187g","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

以抗体为基础的治疗仍然是一种重要的药物发展方式。抗体的结晶对于结构表征是重要的,但除此之外,还具有作为分离方法和用作剂型的潜力。然而,由于抗体的大尺寸、高度的片段灵活性以及所有发生的相互作用(例如蛋白质-蛋白质相互作用、蛋白质-溶剂相互作用等)的复杂性,实现抗体的可控结晶仍然是一项具有挑战性的任务。预测重要接触点的方法可以帮助开发这种结晶方法。然而,迄今为止,有限的数据和理解还不允许开发这种可靠的方法。本研究采用机器学习结合晶体结构的硅模型,利用现有的实验结构来确定成功的抗体结晶所必需的关键物理化学特征,试图弥补这一差距。该方法能较好地区分晶体残基与非晶体残基。使用一组510个描述符来表征每个残差,其被视为不同的数据点。此外,已经开发了新的算法来设计新的描述符,以提高模型的预测能力。片段抗原结合(Fab)区域的研究是由于全长单克隆抗体(mab)晶体结构的稀缺性。目前的研究结果表明,极端梯度增强(XGBoost)算法有效地识别了晶体位点残基,AUPRC值比基线模型高3倍以上。排名靠前的描述符表明,晶体位点残基的主要特征是具有高空间聚集倾向(SAP)的溶剂暴露残基(表示疏水斑块)及其直接表面暴露的邻居。此外,这些高SAP残基通常被其他溶剂暴露的残基所包围,这些残基要么是极性的,要么是带电的,要么是两者兼而有之。相比之下,不涉及晶体界面的残基通常缺乏这些基本特征,尽管有些可能由于特定的晶格排列而被排除在外。此外,将XGBoost模型中的功能集从510个减少到前15%,可以在显著简化模型的同时获得类似的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Investigating structural biophysical features for antigen-binding fragment crystallization via machine learning†

Antibody-based therapeutics continue to be an important pharmaceutical development modality. Crystallization of antibodies is important for structural characterization, but in addition has the potential for use as a separation method and for use as a dosage form. Nevertheless, bringing about controlled crystallization of an antibody remains a challenging task due to its large size, high degree of segmental flexibility, and the intricacy of all the occurring interactions (e.g., protein–protein interactions, protein–solvent interactions, etc.). Methods to predict important contact sites could help to develop such crystallization methods. However, limited data and understanding have hitherto not allowed the development of such robust methods. This study employs machine learning combined with in silico modelling of crystal structures using available experimental structures to identify the crucial physicochemical features necessary for successful antibody crystallization in an attempt to remedy that gap. The developed method can with good accuracy distinguish crystal-site residues from non-crystal-site residues. A set of 510 descriptors is utilized to characterize each residue, which is treated as a distinct data point. Moreover, new algorithms have been developed to design novel descriptors that improve the model's predictive capabilities. Fragment antigen-binding (Fab) regions are investigated due to the scarcity of full-length monoclonal antibodies (mAbs) crystal structures. The current findings show that the extreme gradient boosting (XGBoost) algorithm effectively identifies crystal site residues, as evidenced by an AUPRC value that is more than 3-fold higher than that of the baseline model. The top-ranked descriptors indicate that crystal-site residues are primarily characterized by solvent-exposed residues with high spatial aggregation propensity (SAP), signifying hydrophobic patches, and their immediate surface-exposed neighbors. Moreover, these high SAP residues are often surrounded by other solvent-exposed residues that are either polar, charged, or both. In contrast, residues not involved in crystal interfaces generally lack these essential features, though some might be excluded due to specific crystal lattice arrangements. Additionally, reducing the feature set from 510 to the top 15% in the XGBoost model yields similar performance while significantly simplifying the model.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Molecular Systems Design & Engineering
Molecular Systems Design & Engineering Engineering-Biomedical Engineering
CiteScore
6.40
自引率
2.80%
发文量
144
期刊介绍: Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信