Alg-MFDL: A multi-feature deep learning framework for allergenic proteins prediction

IF 2.6 4区生物学 Q2 BIOCHEMICAL RESEARCH METHODS

Analytical biochemistry Pub Date : 2024-10-29 DOI:10.1016/j.ab.2024.115701

Xiang Hu , Jingyi Li , Taigang Liu

{"title":"Alg-MFDL: A multi-feature deep learning framework for allergenic proteins prediction","authors":"Xiang Hu , Jingyi Li , Taigang Liu","doi":"10.1016/j.ab.2024.115701","DOIUrl":null,"url":null,"abstract":"<div><div>The escalating global incidence of allergy patients illustrates the growing impact of allergic issues on global health. Allergens are small molecule antigens that trigger allergic reactions. A widely recognized strategy for allergy prevention involves identifying allergens and avoiding re-exposure. However, the laboratory methods to identify allergenic proteins are often time-consuming and resource-intensive. There is a crucial need to establish efficient and reliable computational approaches for the identification of allergenic proteins. In this study, we developed a novel allergenic proteins predictor named Alg-MFDL, which integrates pre-trained protein language models (PLMs) and traditional handcrafted features to achieve a more complete protein representation. First, we compared the performance of eight pre-trained PLMs from ProtTrans and ESM-2 and selected the best-performing one from each of the two groups. In addition, we evaluated the performance of three handcrafted features and different combinations of them to select the optimal feature or feature combination. Then, these three protein representations were fused and used as inputs to train the convolutional neural network (CNN). Finally, the independent validation was performed on benchmark datasets to evaluate the performance of Alg-MFDL. As a result, Alg-MFDL achieved an accuracy of 0.973, a precision of 0.996, a sensitivity of 0.951, and an F1 value of 0.973, outperforming the most of current state-of-the-art (SOTA) methods across all key metrics. We anticipated that the proposed model could be considered a useful tool for predicting allergen proteins.</div></div>","PeriodicalId":7830,"journal":{"name":"Analytical biochemistry","volume":"697 ","pages":"Article 115701"},"PeriodicalIF":2.6000,"publicationDate":"2024-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Analytical biochemistry","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0003269724002458","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

The escalating global incidence of allergy patients illustrates the growing impact of allergic issues on global health. Allergens are small molecule antigens that trigger allergic reactions. A widely recognized strategy for allergy prevention involves identifying allergens and avoiding re-exposure. However, the laboratory methods to identify allergenic proteins are often time-consuming and resource-intensive. There is a crucial need to establish efficient and reliable computational approaches for the identification of allergenic proteins. In this study, we developed a novel allergenic proteins predictor named Alg-MFDL, which integrates pre-trained protein language models (PLMs) and traditional handcrafted features to achieve a more complete protein representation. First, we compared the performance of eight pre-trained PLMs from ProtTrans and ESM-2 and selected the best-performing one from each of the two groups. In addition, we evaluated the performance of three handcrafted features and different combinations of them to select the optimal feature or feature combination. Then, these three protein representations were fused and used as inputs to train the convolutional neural network (CNN). Finally, the independent validation was performed on benchmark datasets to evaluate the performance of Alg-MFDL. As a result, Alg-MFDL achieved an accuracy of 0.973, a precision of 0.996, a sensitivity of 0.951, and an F1 value of 0.973, outperforming the most of current state-of-the-art (SOTA) methods across all key metrics. We anticipated that the proposed model could be considered a useful tool for predicting allergen proteins.

查看原文本刊更多论文

Alg-MFDL：用于过敏原蛋白质预测的多特征深度学习框架。

全球过敏症患者的发病率不断攀升，说明过敏问题对全球健康的影响越来越大。过敏原是引发过敏反应的小分子抗原。公认的过敏预防策略包括识别过敏原并避免再次接触。然而，鉴别过敏原蛋白的实验室方法往往耗费时间和资源。因此亟需建立高效可靠的计算方法来识别过敏原蛋白。在这项研究中，我们开发了一种名为 Alg-MFDL 的新型过敏原蛋白质预测器，它整合了预训练蛋白质语言模型（PLM）和传统手工特征，以实现更完整的蛋白质表征。首先，我们比较了来自 ProtTrans 和 ESM-2 的八个预训练蛋白质语言模型的性能，并从两组模型中各选出了一个性能最好的。此外，我们还评估了三种手工特征及其不同组合的性能，以选出最佳特征或特征组合。然后，将这三种蛋白质表征进行融合，并将其作为训练卷积神经网络（CNN）的输入。最后，在基准数据集上进行独立验证，以评估 Alg-MFDL 的性能。结果，Alg-MFDL 的准确度达到了 0.973，精确度达到了 0.996，灵敏度达到了 0.951，F1 值达到了 0.973，在所有关键指标上都优于目前最先进的方法（SOTA）。我们预计，所提出的模型可被视为预测过敏原蛋白的有用工具。本研究使用的数据集和代码可在 https://github.com/Hupenpen/Alg-MFDL 免费获取。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Analytical biochemistry 生物-分析化学

CiteScore

5.70

自引率

0.00%

发文量

283

审稿时长

44 days

期刊介绍： The journal''s title Analytical Biochemistry: Methods in the Biological Sciences declares its broad scope: methods for the basic biological sciences that include biochemistry, molecular genetics, cell biology, proteomics, immunology, bioinformatics and wherever the frontiers of research take the field. The emphasis is on methods from the strictly analytical to the more preparative that would include novel approaches to protein purification as well as improvements in cell and organ culture. The actual techniques are equally inclusive ranging from aptamers to zymology. The journal has been particularly active in: -Analytical techniques for biological molecules- Aptamer selection and utilization- Biosensors- Chromatography- Cloning, sequencing and mutagenesis- Electrochemical methods- Electrophoresis- Enzyme characterization methods- Immunological approaches- Mass spectrometry of proteins and nucleic acids- Metabolomics- Nano level techniques- Optical spectroscopy in all its forms. The journal is reluctant to include most drug and strictly clinical studies as there are more suitable publication platforms for these types of papers.