ProAll-D: protein allergen detection using long short term memory - a deep learning approach.

IF 3.4 Q2 CHEMISTRY, MEDICINAL
ADMET and DMPK Pub Date : 2022-09-13 eCollection Date: 2022-01-01 DOI:10.5599/admet.1335
Pallavi M Shanthappa, Rakshitha Kumar
{"title":"ProAll-D: protein allergen detection using long short term memory - a deep learning approach.","authors":"Pallavi M Shanthappa,&nbsp;Rakshitha Kumar","doi":"10.5599/admet.1335","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>An allergic reaction is the immune system's overreacting to a previously encountered, typically benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a wide range of principles and methods have been applied in bioinformatics. The sequence similarity approach's positive predictive value is very low and ineffective for methods based on FAO/WHO criteria, making it difficult to predict possible allergens.</p><p><strong>Method: </strong>This work advocated the use of a deep learning model LSTM (Long Short-Term Memory) to overcome the limitations of traditional approaches and machine learning lower performance models in predicting the allergenicity of dietary proteins. A total of 2,427 allergens and 2,427 non-allergens, from a variety of sources, including the Central Science Laboratory and the NCBI are used. The data was divided 80:20 for training and testing purposes. These techniques have all been implemented in Python. To describe the protein sequences of allergens and non-allergens, five E-descriptors were used. E1 (hydrophilic character of peptides), E2 (length), E3(propensity to form helices), E4(abundance and dispersion), and E5 (propensity of beta strands) are used to make the variable-length protein sequence to uniform length using ACC transformation. A total of eight machine learning techniques have been taken into consideration.</p><p><strong>Results: </strong>The Gaussian Naive Bayes as accuracy of 64.14 %, Radius Neighbour's Classifier with 49.2 %, Bagging Classifier was 85.8 %, ADA Boost was 76.9 %, Linear Discriminant Analysis has 76.13 %, Quadratic Discriminant Analysis was 84.2 %, Extra Tree Classifier was 90%, and LSTM is 91.5 %.</p><p><strong>Conclusion: </strong>As the LSTM, has an AUC value of 91.5 % is regarded best in predicting allergens. A web server called ProAll-D has been created that successfully identifies novel allergens using the LSTM approach. Users can use the link https://doi.org/10.17632/tjmt97xpjf.1 to access the ProAll-D server and data.</p>","PeriodicalId":7259,"journal":{"name":"ADMET and DMPK","volume":"10 3","pages":"231-240"},"PeriodicalIF":3.4000,"publicationDate":"2022-09-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9484702/pdf/","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ADMET and DMPK","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5599/admet.1335","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 3

Abstract

Background: An allergic reaction is the immune system's overreacting to a previously encountered, typically benign molecule, frequently a protein. Allergy reactions can result in rashes, itching, mucous membrane swelling, asthma, coughing, and other bizarre symptoms. To anticipate allergies, a wide range of principles and methods have been applied in bioinformatics. The sequence similarity approach's positive predictive value is very low and ineffective for methods based on FAO/WHO criteria, making it difficult to predict possible allergens.

Method: This work advocated the use of a deep learning model LSTM (Long Short-Term Memory) to overcome the limitations of traditional approaches and machine learning lower performance models in predicting the allergenicity of dietary proteins. A total of 2,427 allergens and 2,427 non-allergens, from a variety of sources, including the Central Science Laboratory and the NCBI are used. The data was divided 80:20 for training and testing purposes. These techniques have all been implemented in Python. To describe the protein sequences of allergens and non-allergens, five E-descriptors were used. E1 (hydrophilic character of peptides), E2 (length), E3(propensity to form helices), E4(abundance and dispersion), and E5 (propensity of beta strands) are used to make the variable-length protein sequence to uniform length using ACC transformation. A total of eight machine learning techniques have been taken into consideration.

Results: The Gaussian Naive Bayes as accuracy of 64.14 %, Radius Neighbour's Classifier with 49.2 %, Bagging Classifier was 85.8 %, ADA Boost was 76.9 %, Linear Discriminant Analysis has 76.13 %, Quadratic Discriminant Analysis was 84.2 %, Extra Tree Classifier was 90%, and LSTM is 91.5 %.

Conclusion: As the LSTM, has an AUC value of 91.5 % is regarded best in predicting allergens. A web server called ProAll-D has been created that successfully identifies novel allergens using the LSTM approach. Users can use the link https://doi.org/10.17632/tjmt97xpjf.1 to access the ProAll-D server and data.

Abstract Image

Abstract Image

Abstract Image

ProAll-D:蛋白质过敏原检测使用长短期记忆-一种深度学习方法。
背景:过敏反应是免疫系统对先前遇到的通常是良性分子(通常是蛋白质)的过度反应。过敏反应会导致皮疹、瘙痒、粘膜肿胀、哮喘、咳嗽和其他奇怪的症状。为了预测过敏,生物信息学已经应用了广泛的原理和方法。序列相似性方法的阳性预测值非常低,对于基于FAO/WHO标准的方法无效,难以预测可能的过敏原。方法:本工作提倡使用深度学习模型LSTM (Long - Short-Term Memory)来克服传统方法和机器学习低性能模型在预测膳食蛋白质致敏性方面的局限性。总共使用了来自各种来源的2427种过敏原和2427种非过敏原,包括中央科学实验室和NCBI。为了训练和测试的目的,数据被分成80:20。这些技术都在Python中实现了。为了描述过敏原和非过敏原的蛋白质序列,使用了5个e -描述符。E1(多肽亲水性)、E2(长度)、E3(螺旋倾向)、E4(丰度和离散度)和E5 (β链倾向)利用ACC转化将变长蛋白序列转化为均匀长度。总共考虑了八种机器学习技术。结果:高斯朴素贝叶斯分类器准确率为64.14%,半径邻居分类器准确率为49.2%,Bagging分类器准确率为85.8%,ADA Boost分类器准确率为76.9%,线性判别分析准确率为76.13%,二次判别分析准确率为84.2%,Extra Tree分类器准确率为90%,LSTM分类器准确率为91.5%。结论:LSTM的AUC值为91.5%,是预测过敏原的最佳方法。一个名为ProAll-D的web服务器已经被创建,它成功地使用LSTM方法识别新的过敏原。用户可以通过https://doi.org/10.17632/tjmt97xpjf.1链接访问ProAll-D服务器和数据。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ADMET and DMPK
ADMET and DMPK Multiple-
CiteScore
4.40
自引率
0.00%
发文量
22
审稿时长
4 weeks
期刊介绍: ADMET and DMPK is an open access journal devoted to the rapid dissemination of new and original scientific results in all areas of absorption, distribution, metabolism, excretion, toxicology and pharmacokinetics of drugs. ADMET and DMPK publishes the following types of contributions: - Original research papers - Feature articles - Review articles - Short communications and Notes - Letters to Editors - Book reviews The scope of the Journal involves, but is not limited to, the following areas: - physico-chemical properties of drugs and methods of their determination - drug permeabilities - drug absorption - drug-drug, drug-protein, drug-membrane and drug-DNA interactions - chemical stability and degradations of drugs - instrumental methods in ADMET - drug metablic processes - routes of administration and excretion of drug - pharmacokinetic/pharmacodynamic study - quantitative structure activity/property relationship - ADME/PK modelling - Toxicology screening - Transporter identification and study
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信