Multi-view framework for multi-label bioactive peptide classification based on multi-modal representation learning

IF 7.2 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Applied Soft Computing Pub Date : 2025-03-27 DOI:10.1016/j.asoc.2025.113007

Yan Kang , Yue Peng , Dongsheng Zheng , Huadong Zhang , Xuekun Yang

{"title":"Multi-view framework for multi-label bioactive peptide classification based on multi-modal representation learning","authors":"Yan Kang , Yue Peng , Dongsheng Zheng , Huadong Zhang , Xuekun Yang","doi":"10.1016/j.asoc.2025.113007","DOIUrl":null,"url":null,"abstract":"<div><div>The diversity and specific biological functions of bioactive peptides make them key regulators in various physiological processes and crucial contributors to the development of new anti-infective drugs. Although existing graph-based deep learning methods effectively model multi-label peptide representation, they often fail to incorporate multi-modal feature representation and extract multi-scale features from various views. To address these limitations, we present a multi-view framework for multi-label bioactive peptide classification based on multi-modal representation Learning by combining amino acid sequences and fusion molecular fingerprints. The peptide molecular graph is constructed by extracting the topological information and node features, respectively. Multi-view branches are designed by developing sequence-based and graph-based models to leverage their distinct and complementary strengths. Specifically, the protein language model ESM-2 is utilized to extract residue features from amino acid sequences deeply. Meanwhile, local features from molecular fingerprints are learned through a Filter Response Normalization layer and a Thresholded Linear Unit. At the same time, the Mamba module is innovatively employed to extract long-range dependencies and reduce time complexity. Our model demonstrates significantly enhanced and robust performance in multi-label bioactive peptide prediction tasks compared with state-of-the-art models, achieving 82.5% coverage, 80.9% precision and 80.3% accuracy on the MFBP dataset. Furthermore, visual analyses demonstrate that the model can effectively capture features from multiple views and highlight the interpretability of the model through the decision process.</div></div>","PeriodicalId":50737,"journal":{"name":"Applied Soft Computing","volume":"175 ","pages":"Article 113007"},"PeriodicalIF":7.2000,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1568494625003187","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

The diversity and specific biological functions of bioactive peptides make them key regulators in various physiological processes and crucial contributors to the development of new anti-infective drugs. Although existing graph-based deep learning methods effectively model multi-label peptide representation, they often fail to incorporate multi-modal feature representation and extract multi-scale features from various views. To address these limitations, we present a multi-view framework for multi-label bioactive peptide classification based on multi-modal representation Learning by combining amino acid sequences and fusion molecular fingerprints. The peptide molecular graph is constructed by extracting the topological information and node features, respectively. Multi-view branches are designed by developing sequence-based and graph-based models to leverage their distinct and complementary strengths. Specifically, the protein language model ESM-2 is utilized to extract residue features from amino acid sequences deeply. Meanwhile, local features from molecular fingerprints are learned through a Filter Response Normalization layer and a Thresholded Linear Unit. At the same time, the Mamba module is innovatively employed to extract long-range dependencies and reduce time complexity. Our model demonstrates significantly enhanced and robust performance in multi-label bioactive peptide prediction tasks compared with state-of-the-art models, achieving 82.5% coverage, 80.9% precision and 80.3% accuracy on the MFBP dataset. Furthermore, visual analyses demonstrate that the model can effectively capture features from multiple views and highlight the interpretability of the model through the decision process.

查看原文本刊更多论文

基于多模态表示学习的多标签生物活性肽分类多视图框架

生物活性肽的多样性和特殊的生物学功能使其成为多种生理过程的重要调节因子，也是开发新型抗感染药物的重要因素。虽然现有的基于图的深度学习方法可以有效地建模多标签多肽表示，但它们往往不能结合多模态特征表示和从不同的角度提取多尺度特征。为了解决这些限制，我们提出了一个基于多模态表示学习的多标签生物活性肽分类的多视图框架，该框架将氨基酸序列和融合分子指纹相结合。通过提取肽的拓扑信息和节点特征，构建肽分子图。多视图分支是通过开发基于序列和基于图的模型来设计的，以利用它们独特和互补的优势。其中，利用蛋白质语言模型ESM-2对氨基酸序列进行深度残基特征提取。同时，通过滤波响应归一化层和阈值线性单元学习分子指纹的局部特征。同时，创新地采用Mamba模块提取远程依赖关系，降低时间复杂度。与最先进的模型相比，我们的模型在多标签生物活性肽预测任务中表现出显著增强和稳健的性能，在MFBP数据集上实现了82.5%的覆盖率，80.9%的精度和80.3%的准确度。此外，可视化分析表明，该模型可以有效地从多个视图捕获特征，并在决策过程中突出模型的可解释性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Applied Soft Computing 工程技术-计算机：跨学科应用

CiteScore

15.80

自引率

6.90%

发文量

874

审稿时长

10.9 months

期刊介绍： Applied Soft Computing is an international journal promoting an integrated view of soft computing to solve real life problems.The focus is to publish the highest quality research in application and convergence of the areas of Fuzzy Logic, Neural Networks, Evolutionary Computing, Rough Sets and other similar techniques to address real world complexities. Applied Soft Computing is a rolling publication: articles are published as soon as the editor-in-chief has accepted them. Therefore, the web site will continuously be updated with new articles and the publication time will be short.