AllerTrans：利用深度学习改进蛋白质过敏性预测模型

bioRxiv - Bioinformatics Pub Date : 2024-08-10 DOI:10.1101/2024.08.09.607419

Faezeh Sarlakifar, Hamed Malek, Najaf Allahyari Fard, Zahra Khotanlou

{"title":"AllerTrans：利用深度学习改进蛋白质过敏性预测模型","authors":"Faezeh Sarlakifar, Hamed Malek, Najaf Allahyari Fard, Zahra Khotanlou","doi":"10.1101/2024.08.09.607419","DOIUrl":null,"url":null,"abstract":"Recognizing the potential allergenicity of proteins is essential for ensuring their safety. Allergens are a major concern in determining protein safety, especially with the increasing use of recombinant proteins in new medical products. These proteins need careful allergenicity assessment to guarantee their safety. However, traditional laboratory testing for allergenicity is expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. In this study, we developed an enhanced deep-learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. Our approach utilizes two protein language models, to extract distinct feature vectors for each sequence, which are then input into a deep neural network model for classification. Each feature vector represents a specific aspect of the protein sequence, and combining them enhances the final result and balances the model's sensitivity and specificity. The model classifies proteins into allergenic or non-allergenic classes. Our proposed model demonstrates admissible improvement across all evaluation metrics compared to the AlgPred 2.0 model, achieving a sensitivity of 97.91%, specificity of 97.69%, accuracy of 97.80%, and an impressive area under the ROC curve of 99% on the AlgPred 2.0 dataset using standard five-fold cross-validation.","PeriodicalId":501307,"journal":{"name":"bioRxiv - Bioinformatics","volume":"2 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AllerTrans: An Improved Protein Allergenicity Prediction Model Using Deep Learning\",\"authors\":\"Faezeh Sarlakifar, Hamed Malek, Najaf Allahyari Fard, Zahra Khotanlou\",\"doi\":\"10.1101/2024.08.09.607419\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Recognizing the potential allergenicity of proteins is essential for ensuring their safety. Allergens are a major concern in determining protein safety, especially with the increasing use of recombinant proteins in new medical products. These proteins need careful allergenicity assessment to guarantee their safety. However, traditional laboratory testing for allergenicity is expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. In this study, we developed an enhanced deep-learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. Our approach utilizes two protein language models, to extract distinct feature vectors for each sequence, which are then input into a deep neural network model for classification. Each feature vector represents a specific aspect of the protein sequence, and combining them enhances the final result and balances the model's sensitivity and specificity. The model classifies proteins into allergenic or non-allergenic classes. Our proposed model demonstrates admissible improvement across all evaluation metrics compared to the AlgPred 2.0 model, achieving a sensitivity of 97.91%, specificity of 97.69%, accuracy of 97.80%, and an impressive area under the ROC curve of 99% on the AlgPred 2.0 dataset using standard five-fold cross-validation.\",\"PeriodicalId\":501307,\"journal\":{\"name\":\"bioRxiv - Bioinformatics\",\"volume\":\"2 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"bioRxiv - Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1101/2024.08.09.607419\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"bioRxiv - Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1101/2024.08.09.607419","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

识别蛋白质的潜在过敏性对确保其安全性至关重要。过敏原是决定蛋白质安全性的一个主要问题，尤其是随着重组蛋白质在新医疗产品中的使用越来越多。这些蛋白质需要经过仔细的过敏性评估才能保证其安全性。然而，传统的过敏性实验室测试既昂贵又耗时。为了应对这一挑战，生物信息学为预测蛋白质过敏性提供了高效、经济的替代方法。在这项研究中，我们开发了一种增强型深度学习模型，根据蛋白质序列所代表的一级结构预测蛋白质的潜在过敏性。我们的方法利用两个蛋白质语言模型，为每个序列提取不同的特征向量，然后将其输入深度神经网络模型进行分类。每个特征向量都代表了蛋白质序列的一个特定方面，将它们结合起来可以增强最终结果，平衡模型的灵敏度和特异性。该模型可将蛋白质分为致敏和非致敏两类。与 AlgPred 2.0 模型相比，我们提出的模型在所有评估指标上都取得了可喜的进步，在 AlgPred 2.0 数据集上使用标准的五倍交叉验证，灵敏度达到 97.91%，特异度达到 97.69%，准确度达到 97.80%，ROC 曲线下面积达到 99%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

AllerTrans: An Improved Protein Allergenicity Prediction Model Using Deep Learning

Recognizing the potential allergenicity of proteins is essential for ensuring their safety. Allergens are a major concern in determining protein safety, especially with the increasing use of recombinant proteins in new medical products. These proteins need careful allergenicity assessment to guarantee their safety. However, traditional laboratory testing for allergenicity is expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. In this study, we developed an enhanced deep-learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. Our approach utilizes two protein language models, to extract distinct feature vectors for each sequence, which are then input into a deep neural network model for classification. Each feature vector represents a specific aspect of the protein sequence, and combining them enhances the final result and balances the model's sensitivity and specificity. The model classifies proteins into allergenic or non-allergenic classes. Our proposed model demonstrates admissible improvement across all evaluation metrics compared to the AlgPred 2.0 model, achieving a sensitivity of 97.91%, specificity of 97.69%, accuracy of 97.80%, and an impressive area under the ROC curve of 99% on the AlgPred 2.0 dataset using standard five-fold cross-validation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

bioRxiv - Bioinformatics

自引率

0.00%

发文量