PepMNet: a hybrid deep learning model for predicting peptide properties using hierarchical graph representations†

IF 3.2 3区 工程技术 Q2 CHEMISTRY, PHYSICAL
Daniel Garzon Otero, Omid Akbari and Camille Bilodeau
{"title":"PepMNet: a hybrid deep learning model for predicting peptide properties using hierarchical graph representations†","authors":"Daniel Garzon Otero, Omid Akbari and Camille Bilodeau","doi":"10.1039/D4ME00172A","DOIUrl":null,"url":null,"abstract":"<p >Peptides are a powerful class of molecules that can be applied to a range of problems including biomaterials development and drug design. Currently, machine learning-based property prediction models for peptides primarily rely on amino acid sequence, resulting in two key limitations: first, they are not compatible with non-natural peptide features like modified sidechains or staples, and second, they use human-crafted features to describe the relationships between different amino acids, which reduces the model's flexibility and generalizability. To address these challenges, we have developed PepMNet, a deep learning model that integrates atom-level and amino acid-level information through a hierarchical graph approach. The model first learns from an atom-level graph and then generates amino acid representations based on the atomic information captured in the first stage. These amino acid representations are then combined using graph convolutions on an amino acid-level graph to produce a molecular-level representation, which is then passed to a fully connected neural network for property prediction. We evaluated this architecture by predicting two peptide properties: chromatographic retention time (RT) as a regression task and antimicrobial peptide (AMP) activity as a classification task. For the regression task, PepMNet achieved an average <em>R</em><small><sup>2</sup></small> of 0.980 across eight datasets, which spanned different dataset sizes and three liquid chromatography (LC) methods. For the classification task, we developed an ensemble of five models to reduce overfitting and ensure robust classification performance, achieving an area under the receiver operating curve (AUC-ROC) of 0.978 and an average precision of 0.981. Overall, our model illustrates the potential for hierarchical deep learning models to learn peptide properties without relying on human engineering amino acid features.</p>","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":" 3","pages":" 205-218"},"PeriodicalIF":3.2000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/me/d4me00172a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/me/d4me00172a","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Peptides are a powerful class of molecules that can be applied to a range of problems including biomaterials development and drug design. Currently, machine learning-based property prediction models for peptides primarily rely on amino acid sequence, resulting in two key limitations: first, they are not compatible with non-natural peptide features like modified sidechains or staples, and second, they use human-crafted features to describe the relationships between different amino acids, which reduces the model's flexibility and generalizability. To address these challenges, we have developed PepMNet, a deep learning model that integrates atom-level and amino acid-level information through a hierarchical graph approach. The model first learns from an atom-level graph and then generates amino acid representations based on the atomic information captured in the first stage. These amino acid representations are then combined using graph convolutions on an amino acid-level graph to produce a molecular-level representation, which is then passed to a fully connected neural network for property prediction. We evaluated this architecture by predicting two peptide properties: chromatographic retention time (RT) as a regression task and antimicrobial peptide (AMP) activity as a classification task. For the regression task, PepMNet achieved an average R2 of 0.980 across eight datasets, which spanned different dataset sizes and three liquid chromatography (LC) methods. For the classification task, we developed an ensemble of five models to reduce overfitting and ensure robust classification performance, achieving an area under the receiver operating curve (AUC-ROC) of 0.978 and an average precision of 0.981. Overall, our model illustrates the potential for hierarchical deep learning models to learn peptide properties without relying on human engineering amino acid features.

Abstract Image

肽是一类功能强大的分子,可用于解决生物材料开发和药物设计等一系列问题。目前,基于机器学习的多肽性质预测模型主要依赖于氨基酸序列,这导致了两个关键的局限性:首先,这些模型与非天然多肽特征(如修饰侧链或主链)不兼容;其次,它们使用人为创建的特征来描述不同氨基酸之间的关系,这降低了模型的灵活性和通用性。为了应对这些挑战,我们开发了一种深度学习模型 PepMNet,它通过分层图的方法整合了原子层和氨基酸层的信息。该模型首先从原子级图中学习,然后根据第一阶段捕获的原子信息生成氨基酸表征。然后使用氨基酸级图上的图卷积将这些氨基酸表征组合起来,生成分子级表征,再将其传递给全连接神经网络进行属性预测。我们通过预测两种肽属性对这一架构进行了评估:色谱保留时间 (RT) 作为回归任务,抗菌肽 (AMP) 活性作为分类任务。在回归任务中,PepMNet 在不同数据集大小和三种液相色谱 (LC) 方法的八个数据集中取得了 0.980 的平均 R2。在分类任务中,我们开发了一个由五个模型组成的集合,以减少过拟合并确保稳健的分类性能,接收者操作曲线下面积 (AUC-ROC) 达到 0.978,平均精度达到 0.981。总之,我们的模型说明了分层深度学习模型在不依赖人类工程氨基酸特征的情况下学习肽特性的潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Systems Design & Engineering
Molecular Systems Design & Engineering Engineering-Biomedical Engineering
CiteScore
6.40
自引率
2.80%
发文量
144
期刊介绍: Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信