iBitter-Stack: A multi-representation ensemble learning model for accurate bitter peptide identification

IF 4.5 2区 生物学 Q1 BIOCHEMISTRY & MOLECULAR BIOLOGY
Sarfraz Ahmad , Momina Ahsan , Muhammad Nabeel Asim , Andreas Dengel , Muhammad Imran Malik
{"title":"iBitter-Stack: A multi-representation ensemble learning model for accurate bitter peptide identification","authors":"Sarfraz Ahmad ,&nbsp;Momina Ahsan ,&nbsp;Muhammad Nabeel Asim ,&nbsp;Andreas Dengel ,&nbsp;Muhammad Imran Malik","doi":"10.1016/j.jmb.2025.169448","DOIUrl":null,"url":null,"abstract":"<div><div>The identification of bitter peptides is crucial in various domains, including food science, drug discovery, and biochemical research. These peptides not only contribute to the undesirable taste of hydrolyzed proteins but also play key roles in physiological and pharmacological processes. However, experimental methods for identifying bitter peptides are time-consuming and expensive. With the rapid expansion of peptide sequence databases in the post-genomic era, the demand for efficient computational approaches to distinguish bitter from non-bitter peptides has become increasingly significant. In this study, we propose a novel stacking-based ensemble learning framework aimed at enhancing the accuracy and reliability of bitter peptide classification. Our method integrates diverse sequence-based feature representations and leverages a broad set of machine learning classifiers. The first stacking layer comprises multiple base classifiers, each trained on distinct feature encoding schemes, while the second layer employs logistic regression to refine predictions using an eight-dimensional probability vector. Extensive evaluations on a carefully curated dataset demonstrate that our model significantly outperforms existing predictive methods, providing a robust and reliable computational tool for bitter peptide identification. Our approach achieves an accuracy of 96.09% and a Matthews Correlation Coefficient (MCC) of 0.9220 on the independent test set, underscoring its effectiveness and generalizability. To facilitate real-time usage and broader accessibility, we have also developed a user-friendly web server based on the proposed method, which is freely accessible at <span>ibitter-stack-webserver.streamlit.app</span>. This tool enables researchers and practitioners to conveniently screen peptide sequences for bitterness in real-time applications.</div></div>","PeriodicalId":369,"journal":{"name":"Journal of Molecular Biology","volume":"437 24","pages":"Article 169448"},"PeriodicalIF":4.5000,"publicationDate":"2025-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Molecular Biology","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0022283625005145","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

The identification of bitter peptides is crucial in various domains, including food science, drug discovery, and biochemical research. These peptides not only contribute to the undesirable taste of hydrolyzed proteins but also play key roles in physiological and pharmacological processes. However, experimental methods for identifying bitter peptides are time-consuming and expensive. With the rapid expansion of peptide sequence databases in the post-genomic era, the demand for efficient computational approaches to distinguish bitter from non-bitter peptides has become increasingly significant. In this study, we propose a novel stacking-based ensemble learning framework aimed at enhancing the accuracy and reliability of bitter peptide classification. Our method integrates diverse sequence-based feature representations and leverages a broad set of machine learning classifiers. The first stacking layer comprises multiple base classifiers, each trained on distinct feature encoding schemes, while the second layer employs logistic regression to refine predictions using an eight-dimensional probability vector. Extensive evaluations on a carefully curated dataset demonstrate that our model significantly outperforms existing predictive methods, providing a robust and reliable computational tool for bitter peptide identification. Our approach achieves an accuracy of 96.09% and a Matthews Correlation Coefficient (MCC) of 0.9220 on the independent test set, underscoring its effectiveness and generalizability. To facilitate real-time usage and broader accessibility, we have also developed a user-friendly web server based on the proposed method, which is freely accessible at ibitter-stack-webserver.streamlit.app. This tool enables researchers and practitioners to conveniently screen peptide sequences for bitterness in real-time applications.

Abstract Image

苦肽识别的多表示集成学习模型。
苦味肽的鉴定在食品科学、药物发现和生化研究等各个领域都至关重要。这些肽不仅有助于水解蛋白的不良味道,而且在生理和药理过程中发挥关键作用。然而,鉴定苦肽的实验方法既耗时又昂贵。在后基因组时代,随着肽序列数据库的快速扩展,对区分苦肽和非苦肽的高效计算方法的需求日益显著。在这项研究中,我们提出了一种新的基于堆叠的集成学习框架,旨在提高苦肽分类的准确性和可靠性。我们的方法集成了各种基于序列的特征表示,并利用了一套广泛的机器学习分类器。第一个堆叠层包含多个基本分类器,每个分类器在不同的特征编码方案上进行训练,而第二层使用逻辑回归来使用八维概率向量来改进预测。对精心策划的数据集的广泛评估表明,我们的模型明显优于现有的预测方法,为苦肽识别提供了一个强大而可靠的计算工具。我们的方法在独立测试集上的准确率为96.09%,马修斯相关系数(MCC)为0.9220,强调了其有效性和泛化性。为了方便实时使用和更广泛的可访问性,我们还基于所提出的方法开发了一个用户友好的web服务器,可以在bitbit-stack -webserver.streamlit.app上免费访问。该工具使研究人员和从业人员能够方便地筛选肽序列的苦味在实时应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Molecular Biology
Journal of Molecular Biology 生物-生化与分子生物学
CiteScore
11.30
自引率
1.80%
发文量
412
审稿时长
28 days
期刊介绍: Journal of Molecular Biology (JMB) provides high quality, comprehensive and broad coverage in all areas of molecular biology. The journal publishes original scientific research papers that provide mechanistic and functional insights and report a significant advance to the field. The journal encourages the submission of multidisciplinary studies that use complementary experimental and computational approaches to address challenging biological questions. Research areas include but are not limited to: Biomolecular interactions, signaling networks, systems biology; Cell cycle, cell growth, cell differentiation; Cell death, autophagy; Cell signaling and regulation; Chemical biology; Computational biology, in combination with experimental studies; DNA replication, repair, and recombination; Development, regenerative biology, mechanistic and functional studies of stem cells; Epigenetics, chromatin structure and function; Gene expression; Membrane processes, cell surface proteins and cell-cell interactions; Methodological advances, both experimental and theoretical, including databases; Microbiology, virology, and interactions with the host or environment; Microbiota mechanistic and functional studies; Nuclear organization; Post-translational modifications, proteomics; Processing and function of biologically important macromolecules and complexes; Molecular basis of disease; RNA processing, structure and functions of non-coding RNAs, transcription; Sorting, spatiotemporal organization, trafficking; Structural biology; Synthetic biology; Translation, protein folding, chaperones, protein degradation and quality control.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信