AllerTrans: An Improved Protein Allergenicity Prediction Model Using Deep Learning

Faezeh Sarlakifar, Hamed Malek, Najaf Allahyari Fard, Zahra Khotanlou
{"title":"AllerTrans: An Improved Protein Allergenicity Prediction Model Using Deep Learning","authors":"Faezeh Sarlakifar, Hamed Malek, Najaf Allahyari Fard, Zahra Khotanlou","doi":"10.1101/2024.08.09.607419","DOIUrl":null,"url":null,"abstract":"Recognizing the potential allergenicity of proteins is essential for ensuring their safety. Allergens are a major concern in determining protein safety, especially with the increasing use of recombinant proteins in new medical products. These proteins need careful allergenicity assessment to guarantee their safety. However, traditional laboratory testing for allergenicity is expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. In this study, we developed an enhanced deep-learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. Our approach utilizes two protein language models, to extract distinct feature vectors for each sequence, which are then input into a deep neural network model for classification. Each feature vector represents a specific aspect of the protein sequence, and combining them enhances the final result and balances the model's sensitivity and specificity. The model classifies proteins into allergenic or non-allergenic classes. Our proposed model demonstrates admissible improvement across all evaluation metrics compared to the AlgPred 2.0 model, achieving a sensitivity of 97.91%, specificity of 97.69%, accuracy of 97.80%, and an impressive area under the ROC curve of 99% on the AlgPred 2.0 dataset using standard five-fold cross-validation.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.09.607419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Recognizing the potential allergenicity of proteins is essential for ensuring their safety. Allergens are a major concern in determining protein safety, especially with the increasing use of recombinant proteins in new medical products. These proteins need careful allergenicity assessment to guarantee their safety. However, traditional laboratory testing for allergenicity is expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. In this study, we developed an enhanced deep-learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. Our approach utilizes two protein language models, to extract distinct feature vectors for each sequence, which are then input into a deep neural network model for classification. Each feature vector represents a specific aspect of the protein sequence, and combining them enhances the final result and balances the model's sensitivity and specificity. The model classifies proteins into allergenic or non-allergenic classes. Our proposed model demonstrates admissible improvement across all evaluation metrics compared to the AlgPred 2.0 model, achieving a sensitivity of 97.91%, specificity of 97.69%, accuracy of 97.80%, and an impressive area under the ROC curve of 99% on the AlgPred 2.0 dataset using standard five-fold cross-validation.
AllerTrans:利用深度学习改进蛋白质过敏性预测模型
识别蛋白质的潜在过敏性对确保其安全性至关重要。过敏原是决定蛋白质安全性的一个主要问题,尤其是随着重组蛋白质在新医疗产品中的使用越来越多。这些蛋白质需要经过仔细的过敏性评估才能保证其安全性。然而,传统的过敏性实验室测试既昂贵又耗时。为了应对这一挑战,生物信息学为预测蛋白质过敏性提供了高效、经济的替代方法。在这项研究中,我们开发了一种增强型深度学习模型,根据蛋白质序列所代表的一级结构预测蛋白质的潜在过敏性。我们的方法利用两个蛋白质语言模型,为每个序列提取不同的特征向量,然后将其输入深度神经网络模型进行分类。每个特征向量都代表了蛋白质序列的一个特定方面,将它们结合起来可以增强最终结果,平衡模型的灵敏度和特异性。该模型可将蛋白质分为致敏和非致敏两类。与 AlgPred 2.0 模型相比,我们提出的模型在所有评估指标上都取得了可喜的进步,在 AlgPred 2.0 数据集上使用标准的五倍交叉验证,灵敏度达到 97.91%,特异度达到 97.69%,准确度达到 97.80%,ROC 曲线下面积达到 99%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信