AllerTrans: a deep learning method for predicting the allergenicity of protein sequences.

IF 1.3 Q3 BIOCHEMICAL RESEARCH METHODS

Biology Methods and Protocols Pub Date : 2025-07-09 eCollection Date: 2025-01-01 DOI:10.1093/biomethods/bpaf040

Faezeh Sarlakifar, Hamed Malek, Najaf Allahyari Fard

{"title":"AllerTrans: a deep learning method for predicting the allergenicity of protein sequences.","authors":"Faezeh Sarlakifar, Hamed Malek, Najaf Allahyari Fard","doi":"10.1093/biomethods/bpaf040","DOIUrl":null,"url":null,"abstract":"<p><p>Allergens are a major concern in determining protein safety, especially with the growing use of recombinant proteins in new medical products. These proteins require a careful allergenicity assessment to guarantee their safety. However, traditional laboratory tests for allergenicity are expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. Deep learning models offer a promising solution for this purpose. Recently, with the emergence of protein language models(pLMs), high-quality and impactful feature vectors can be extracted from protein sequences using these specialized language models. Although different computational methods can be effective individually, combining them could improve the prediction results. Given this hypothesis, can we develop a more powerful approach than existing methods to predict protein allergenicity? In this study, we developed an enhanced deep learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. In simple terms, this model classifies protein sequences into allergenic or non-allergenic classes. Our approach utilizes two pLMs to extract distinct feature vectors for each sequence, which are then fed into a deep neural network (DNN) model for classification. Combining these feature vectors improves the results. Finally, we integrated our top-performing models using ensemble modeling techniques. This approach could balance the model's sensitivity and specificity. Our proposed model demonstrates an improvement compared to existing models, achieving a sensitivity of 97.91%, a specificity of 97.69%, an accuracy of 97.80%, and an area under the receiver operating characteristic curve of 99% using the standard 2-fold cross-validation. The AllerTrans model has been deployed as a web-based prediction tool and is publicly accessible at: https://huggingface.co/spaces/sfaezella/AllerTrans.</p>","PeriodicalId":36528,"journal":{"name":"Biology Methods and Protocols","volume":"10 1","pages":"bpaf040"},"PeriodicalIF":1.3000,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12254128/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology Methods and Protocols","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/biomethods/bpaf040","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}

引用次数: 0

Abstract

Allergens are a major concern in determining protein safety, especially with the growing use of recombinant proteins in new medical products. These proteins require a careful allergenicity assessment to guarantee their safety. However, traditional laboratory tests for allergenicity are expensive and time-consuming. To address this challenge, bioinformatics offers efficient and cost-effective alternatives for predicting protein allergenicity. Deep learning models offer a promising solution for this purpose. Recently, with the emergence of protein language models(pLMs), high-quality and impactful feature vectors can be extracted from protein sequences using these specialized language models. Although different computational methods can be effective individually, combining them could improve the prediction results. Given this hypothesis, can we develop a more powerful approach than existing methods to predict protein allergenicity? In this study, we developed an enhanced deep learning model to predict the potential allergenicity of proteins based on their primary structure represented as protein sequences. In simple terms, this model classifies protein sequences into allergenic or non-allergenic classes. Our approach utilizes two pLMs to extract distinct feature vectors for each sequence, which are then fed into a deep neural network (DNN) model for classification. Combining these feature vectors improves the results. Finally, we integrated our top-performing models using ensemble modeling techniques. This approach could balance the model's sensitivity and specificity. Our proposed model demonstrates an improvement compared to existing models, achieving a sensitivity of 97.91%, a specificity of 97.69%, an accuracy of 97.80%, and an area under the receiver operating characteristic curve of 99% using the standard 2-fold cross-validation. The AllerTrans model has been deployed as a web-based prediction tool and is publicly accessible at: https://huggingface.co/spaces/sfaezella/AllerTrans.

Abstract Image

查看原文本刊更多论文

AllerTrans：一种用于预测蛋白质序列致敏性的深度学习方法。

过敏原是决定蛋白质安全性的一个主要问题，特别是在新的医疗产品中越来越多地使用重组蛋白。这些蛋白质需要仔细的过敏原评估以保证其安全性。然而，传统的实验室过敏原测试既昂贵又耗时。为了应对这一挑战，生物信息学为预测蛋白质过敏原提供了高效和经济的替代方法。深度学习模型为此提供了一个很有前途的解决方案。近年来，随着蛋白质语言模型（pLMs）的出现，利用这些专门的语言模型可以从蛋白质序列中提取出高质量和有效的特征向量。虽然不同的计算方法可以单独有效，但将它们结合起来可以改善预测结果。鉴于这一假设，我们能否开发出一种比现有方法更有效的方法来预测蛋白质的过敏原性？在这项研究中，我们开发了一个增强的深度学习模型来预测蛋白质的潜在致敏性，该模型基于蛋白质序列表示的初级结构。简单来说，该模型将蛋白质序列分为过敏性和非过敏性两类。我们的方法利用两个plm为每个序列提取不同的特征向量，然后将其输入深度神经网络（DNN）模型进行分类。结合这些特征向量可以改善结果。最后，我们使用集成建模技术集成了我们表现最好的模型。这种方法可以平衡模型的敏感性和特异性。与现有模型相比，我们提出的模型有了改进，使用标准的2倍交叉验证，灵敏度为97.91%，特异性为97.69%，准确度为97.80%，受试者工作特征曲线下面积为99%。AllerTrans模型已被部署为基于web的预测工具，并可在https://huggingface.co/spaces/sfaezella/AllerTrans公开访问。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊