PepMNet: a hybrid deep learning model for predicting peptide properties using hierarchical graph representations†

IF 3.2 3区工程技术 Q2 CHEMISTRY, PHYSICAL

Molecular Systems Design & Engineering Pub Date : 2024-12-11 DOI:10.1039/D4ME00172A

Daniel Garzon Otero, Omid Akbari and Camille Bilodeau

{"title":"PepMNet: a hybrid deep learning model for predicting peptide properties using hierarchical graph representations†","authors":"Daniel Garzon Otero, Omid Akbari and Camille Bilodeau","doi":"10.1039/D4ME00172A","DOIUrl":null,"url":null,"abstract":"Peptides are a powerful class of molecules that can be applied to a range of problems including biomaterials development and drug design. Currently, machine learning-based property prediction models for peptides primarily rely on amino acid sequence, resulting in two key limitations: first, they are not compatible with non-natural peptide features like modified sidechains or staples, and second, they use human-crafted features to describe the relationships between different amino acids, which reduces the model's flexibility and generalizability. To address these challenges, we have developed PepMNet, a deep learning model that integrates atom-level and amino acid-level information through a hierarchical graph approach. The model first learns from an atom-level graph and then generates amino acid representations based on the atomic information captured in the first stage. These amino acid representations are then combined using graph convolutions on an amino acid-level graph to produce a molecular-level representation, which is then passed to a fully connected neural network for property prediction. We evaluated this architecture by predicting two peptide properties: chromatographic retention time (RT) as a regression task and antimicrobial peptide (AMP) activity as a classification task. For the regression task, PepMNet achieved an average R2 of 0.980 across eight datasets, which spanned different dataset sizes and three liquid chromatography (LC) methods. For the classification task, we developed an ensemble of five models to reduce overfitting and ensure robust classification performance, achieving an area under the receiver operating curve (AUC-ROC) of 0.978 and an average precision of 0.981. Overall, our model illustrates the potential for hierarchical deep learning models to learn peptide properties without relying on human engineering amino acid features.","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":" 3","pages":" 205-218"},"PeriodicalIF":3.2000,"publicationDate":"2024-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2025/me/d4me00172a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2025/me/d4me00172a","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Peptides are a powerful class of molecules that can be applied to a range of problems including biomaterials development and drug design. Currently, machine learning-based property prediction models for peptides primarily rely on amino acid sequence, resulting in two key limitations: first, they are not compatible with non-natural peptide features like modified sidechains or staples, and second, they use human-crafted features to describe the relationships between different amino acids, which reduces the model's flexibility and generalizability. To address these challenges, we have developed PepMNet, a deep learning model that integrates atom-level and amino acid-level information through a hierarchical graph approach. The model first learns from an atom-level graph and then generates amino acid representations based on the atomic information captured in the first stage. These amino acid representations are then combined using graph convolutions on an amino acid-level graph to produce a molecular-level representation, which is then passed to a fully connected neural network for property prediction. We evaluated this architecture by predicting two peptide properties: chromatographic retention time (RT) as a regression task and antimicrobial peptide (AMP) activity as a classification task. For the regression task, PepMNet achieved an average R² of 0.980 across eight datasets, which spanned different dataset sizes and three liquid chromatography (LC) methods. For the classification task, we developed an ensemble of five models to reduce overfitting and ensure robust classification performance, achieving an area under the receiver operating curve (AUC-ROC) of 0.978 and an average precision of 0.981. Overall, our model illustrates the potential for hierarchical deep learning models to learn peptide properties without relying on human engineering amino acid features.

Abstract Image

查看原文本刊更多论文

PepMNet：一个混合深度学习模型，用于使用分层图表示预测肽的性质

肽是一类功能强大的分子，可用于解决生物材料开发和药物设计等一系列问题。目前，基于机器学习的多肽性质预测模型主要依赖于氨基酸序列，这导致了两个关键的局限性：首先，这些模型与非天然多肽特征（如修饰侧链或主链）不兼容；其次，它们使用人为创建的特征来描述不同氨基酸之间的关系，这降低了模型的灵活性和通用性。为了应对这些挑战，我们开发了一种深度学习模型 PepMNet，它通过分层图的方法整合了原子层和氨基酸层的信息。该模型首先从原子级图中学习，然后根据第一阶段捕获的原子信息生成氨基酸表征。然后使用氨基酸级图上的图卷积将这些氨基酸表征组合起来，生成分子级表征，再将其传递给全连接神经网络进行属性预测。我们通过预测两种肽属性对这一架构进行了评估：色谱保留时间 (RT) 作为回归任务，抗菌肽 (AMP) 活性作为分类任务。在回归任务中，PepMNet 在不同数据集大小和三种液相色谱 (LC) 方法的八个数据集中取得了 0.980 的平均 R2。在分类任务中，我们开发了一个由五个模型组成的集合，以减少过拟合并确保稳健的分类性能，接收者操作曲线下面积 (AUC-ROC) 达到 0.978，平均精度达到 0.981。总之，我们的模型说明了分层深度学习模型在不依赖人类工程氨基酸特征的情况下学习肽特性的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular Systems Design & Engineering Engineering-Biomedical Engineering

CiteScore

6.40

自引率

2.80%

发文量

144

期刊介绍： Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.