Seq_B_LSTM_CNN_HPO: Rare Mendelian Diseases to Genotypes Associations from Multiple Data Sources

Mohamed Elhajabdou, Amr Maged Ehelw, H. Eldib, Mohamed Elhabrouk
{"title":"Seq_B_LSTM_CNN_HPO: Rare Mendelian Diseases to Genotypes Associations from Multiple Data Sources","authors":"Mohamed Elhajabdou, Amr Maged Ehelw, H. Eldib, Mohamed Elhabrouk","doi":"10.53043/2320-1991.acb90029","DOIUrl":null,"url":null,"abstract":"Motivation: Genotype-Phenotype annotations have become a crucial tool for studying the abnormalities in phenotype diseases. These abnormalities and relations can help to understand more the complex, and hidden information. This information clearly describes the genetic mutations causes in the organisms such as human. Several systems and algorithms have been proposed and implemented to solve this issue, since the digital information is provided for free online from different resources that describe the human mutations and the different variations in genes. Machine learning, especially deep artificial neural network, has proven its ability to overcome the limitations of these traditional algorithms and remarkably performing at extraordinary accuracies compared to conventional methods such as statistical techniques and others. Results: In this paper, a multilabel hyper-artificial neural networks model classifier is proposed and implemented for predicting rare mendelian diseases. It is called Seq_B_LSTM_CNN_HPO. The proposed system trained on more than 50 features obtained from four data sources, Gene Ontology (GO), Human Phenotype Ontology (HPO), UniProtKB, and Gene Expressions to learn complex features and relations. The proposed system was tested on UniProtKB dataset and compared with different proposed systems in the fields. The experiment was performed on human organism for variety of analytical study in order to find new relations between phenotype diseases. The tabulated results are evaluated using six different unique evaluation metrics with outstanding results scores of Fmax, Precision, Recall, AUPR, AUROC, Smin with scores of 0.894, 0.902, 0.886, 0.711, 0.631, 0.384 which outperformed several proposed systems in the literature. Data and Source Code Availability: The source code is provided at GitHub repository and the dataset is uploaded at Google_Drive","PeriodicalId":191002,"journal":{"name":"Applied Cell Biology","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied Cell Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.53043/2320-1991.acb90029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Genotype-Phenotype annotations have become a crucial tool for studying the abnormalities in phenotype diseases. These abnormalities and relations can help to understand more the complex, and hidden information. This information clearly describes the genetic mutations causes in the organisms such as human. Several systems and algorithms have been proposed and implemented to solve this issue, since the digital information is provided for free online from different resources that describe the human mutations and the different variations in genes. Machine learning, especially deep artificial neural network, has proven its ability to overcome the limitations of these traditional algorithms and remarkably performing at extraordinary accuracies compared to conventional methods such as statistical techniques and others. Results: In this paper, a multilabel hyper-artificial neural networks model classifier is proposed and implemented for predicting rare mendelian diseases. It is called Seq_B_LSTM_CNN_HPO. The proposed system trained on more than 50 features obtained from four data sources, Gene Ontology (GO), Human Phenotype Ontology (HPO), UniProtKB, and Gene Expressions to learn complex features and relations. The proposed system was tested on UniProtKB dataset and compared with different proposed systems in the fields. The experiment was performed on human organism for variety of analytical study in order to find new relations between phenotype diseases. The tabulated results are evaluated using six different unique evaluation metrics with outstanding results scores of Fmax, Precision, Recall, AUPR, AUROC, Smin with scores of 0.894, 0.902, 0.886, 0.711, 0.631, 0.384 which outperformed several proposed systems in the literature. Data and Source Code Availability: The source code is provided at GitHub repository and the dataset is uploaded at Google_Drive
Seq_B_LSTM_CNN_HPO:来自多个数据来源的罕见孟德尔病基因型关联
动机:基因型-表型注释已成为研究表型疾病异常的重要工具。这些异常和关系可以帮助我们了解更多复杂的、隐藏的信息。这一信息清楚地描述了基因突变引起的生物,如人类。一些系统和算法已经被提出并实施来解决这个问题,因为数字信息是在线免费提供的,这些信息来自不同的资源,描述了人类突变和基因的不同变异。机器学习,特别是深度人工神经网络,已经证明了其克服这些传统算法局限性的能力,与统计技术等传统方法相比,机器学习具有非凡的准确性。结果:本文提出并实现了一种用于罕见孟德尔病预测的多标签超人工神经网络模型分类器。它被称为Seq_B_LSTM_CNN_HPO。该系统对来自基因本体(GO)、人类表型本体(HPO)、UniProtKB和基因表达四个数据源的50多个特征进行训练,以学习复杂的特征和关系。在UniProtKB数据集上对该系统进行了测试,并与不同的系统进行了田间对比。本实验在人体机体上进行多种分析研究,以期发现表型疾病之间的新关系。使用6个不同的独特评价指标对表中的结果进行评价,Fmax、Precision、Recall、AUPR、AUROC、Smin的得分分别为0.894、0.902、0.886、0.711、0.631、0.384,优于文献中提出的几个系统。数据和源代码可用性:源代码在GitHub存储库中提供,数据集在Google_Drive上传
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信