Identifying Polymers that Bind or Reject Proteins with Machine Learning: Handling Categorical Features within a GPR Model.

IF 6.9 Q1 POLYMER SCIENCE
ACS polymers Au Pub Date : 2026-02-11 eCollection Date: 2026-04-08 DOI:10.1021/acspolymersau.5c00177
Ramindu De Silva, Wei Ge, Carolin Bapp, Ahmed Z Mustafa, Robert Chapman, Yanan Fan, Scott A Sisson, Martina H Stenzel
{"title":"Identifying Polymers that Bind or Reject Proteins with Machine Learning: Handling Categorical Features within a GPR Model.","authors":"Ramindu De Silva, Wei Ge, Carolin Bapp, Ahmed Z Mustafa, Robert Chapman, Yanan Fan, Scott A Sisson, Martina H Stenzel","doi":"10.1021/acspolymersau.5c00177","DOIUrl":null,"url":null,"abstract":"<p><p>Understanding the interaction between polymers and proteins is of interest for researchers in medicine, biology, food science, and water treatment, among other fields. The goal may be to create strong interactions with enzymes to improve their catalytic stability, while in nanomedicine and biomedical engineering, the focus is often on reducing protein adsorption on polymer surfaces. Researchers have developed libraries of polymers with various monomer combinations and tested their binding to different proteins to better understand these interactions. In this work, we aimed to identify the polymer with the highest or lowest binding affinity to all proteins, respectively, using Gaussian Process Regression (GPR). However, incorporating categorical features such as the type of monomer has not been widely applied in GPR. Here we compare a range of process models, which were coined Multiplicative kernel, Additive kernel, Easy to interpret Gaussian Process model (EzGP), Latent Variable Gaussian Processes (LVGP), and the Latent Map Gaussian Processes (LMGP) by their developers. The LVGP model was found to perform best on the polymer-protein data set, where the output for binding strength was given by Förster resonance energy transfer (FRET), which can be used to help generate large data sets for machine learning (ML). The polymer that had the highest affinity to glucose oxidase (GOx), uricase (Uri), casein (Cas), trypsin (Trp), carbonic anhydrase (CAn) and bovine serum albumin (BSA) carried positive charges as well as hydrophobic benzyl groups. Negatively charged monomers dominated the polymer that rejected the most proteins intermixed with some cationic units, reminiscent of zwitterionic polymers.</p>","PeriodicalId":72049,"journal":{"name":"ACS polymers Au","volume":"6 2","pages":"587-598"},"PeriodicalIF":6.9000,"publicationDate":"2026-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC13067167/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS polymers Au","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1021/acspolymersau.5c00177","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2026/4/8 0:00:00","PubModel":"eCollection","JCR":"Q1","JCRName":"POLYMER SCIENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Understanding the interaction between polymers and proteins is of interest for researchers in medicine, biology, food science, and water treatment, among other fields. The goal may be to create strong interactions with enzymes to improve their catalytic stability, while in nanomedicine and biomedical engineering, the focus is often on reducing protein adsorption on polymer surfaces. Researchers have developed libraries of polymers with various monomer combinations and tested their binding to different proteins to better understand these interactions. In this work, we aimed to identify the polymer with the highest or lowest binding affinity to all proteins, respectively, using Gaussian Process Regression (GPR). However, incorporating categorical features such as the type of monomer has not been widely applied in GPR. Here we compare a range of process models, which were coined Multiplicative kernel, Additive kernel, Easy to interpret Gaussian Process model (EzGP), Latent Variable Gaussian Processes (LVGP), and the Latent Map Gaussian Processes (LMGP) by their developers. The LVGP model was found to perform best on the polymer-protein data set, where the output for binding strength was given by Förster resonance energy transfer (FRET), which can be used to help generate large data sets for machine learning (ML). The polymer that had the highest affinity to glucose oxidase (GOx), uricase (Uri), casein (Cas), trypsin (Trp), carbonic anhydrase (CAn) and bovine serum albumin (BSA) carried positive charges as well as hydrophobic benzyl groups. Negatively charged monomers dominated the polymer that rejected the most proteins intermixed with some cationic units, reminiscent of zwitterionic polymers.

用机器学习识别结合或拒绝蛋白质的聚合物:处理GPR模型中的分类特征。
了解聚合物和蛋白质之间的相互作用是医学、生物学、食品科学和水处理等领域的研究人员感兴趣的问题。目标可能是与酶产生强烈的相互作用,以提高它们的催化稳定性,而在纳米医学和生物医学工程中,重点往往是减少蛋白质在聚合物表面的吸附。研究人员开发了具有各种单体组合的聚合物文库,并测试了它们与不同蛋白质的结合,以更好地了解这些相互作用。在这项工作中,我们旨在利用高斯过程回归(GPR)分别确定与所有蛋白质结合亲和力最高或最低的聚合物。然而,结合单体类型等分类特征在探地雷达中尚未得到广泛应用。在这里,我们比较了一系列过程模型,它们是由其开发者创造的乘法核,加性核,易于解释的高斯过程模型(EzGP),潜在变量高斯过程(LVGP)和潜在映射高斯过程(LMGP)。LVGP模型在聚合物-蛋白质数据集上表现最好,其中结合强度的输出由Förster共振能量转移(FRET)给出,可用于帮助生成用于机器学习(ML)的大型数据集。对葡萄糖氧化酶(GOx)、尿酸酶(Uri)、酪蛋白(Cas)、胰蛋白酶(Trp)、碳酸酐酶(CAn)和牛血清白蛋白(BSA)亲和力最高的聚合物携带正电荷和疏水性苯基。带负电荷的单体占主导地位的聚合物拒绝了大多数蛋白质与一些阳离子单位混合,让人想起两性离子聚合物。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
2.50
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信
小红书