tRNA-DL: A Deep Learning Approach to Improve tRNAscan-SE Prediction Results.

IF 1.1 4区 生物学 Q4 GENETICS & HEREDITY
Human Heredity Pub Date : 2018-01-01 Epub Date: 2019-01-25 DOI:10.1159/000493215
Xin Gao, Zhi Wei, Hakon Hakonarson
{"title":"tRNA-DL: A Deep Learning Approach to Improve tRNAscan-SE Prediction Results.","authors":"Xin Gao,&nbsp;Zhi Wei,&nbsp;Hakon Hakonarson","doi":"10.1159/000493215","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>tRNAscan-SE is the leading tool for transfer RNA (tRNA) annotation, which has been widely used in the field. However, tRNAscan-SE can return a significant number of false positives when applied to large sequences. Recently, conventional machine learning methods have been proposed to address this issue, but their efficiency can be still limited due to their dependency on handcrafted features. With the growing availability of large-scale genomic data-sets, deep learning methods, especially convolutional neural networks, have demonstrated excellent power in characterizing sequence patterns in genomic sequences. Thus, we hypothesize that deep learning may bring further improvement for tRNA prediction.</p><p><strong>Methods: </strong>We proposed a new computational approach based on deep neural networks to predict tRNA gene sequences. We designed and investigated various deep neural network architectures. We used the tRNA sequences as positive samples, and the false-positive tRNA sequences predicted by tRNAscan-SE in coding sequences as negative samples, to train and evaluate the proposed models by comparison with the conventional machine learning methods and popular tRNA prediction tools.</p><p><strong>Results: </strong>Using the one-hot encoding method, our proposed models can extract features without involving extensive manual feature engineering. Our proposed best model outperformed the existing methods under different performance metrics.</p><p><strong>Conclusion: </strong>The proposed deep learning methods can substantially reduce the false positive output by the state-of-the-art tool tRNAscan-SE. Coupled with tRNAscan-SE, it can serve as a useful complementary tool for tRNA annotation. The application to tRNA prediction demonstrates the superiority of deep learning in automatic feature generation for characterizing sequence patterns.</p>","PeriodicalId":13226,"journal":{"name":"Human Heredity","volume":"83 3","pages":"163-172"},"PeriodicalIF":1.1000,"publicationDate":"2018-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1159/000493215","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Human Heredity","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1159/000493215","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/1/25 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 6

Abstract

Background: tRNAscan-SE is the leading tool for transfer RNA (tRNA) annotation, which has been widely used in the field. However, tRNAscan-SE can return a significant number of false positives when applied to large sequences. Recently, conventional machine learning methods have been proposed to address this issue, but their efficiency can be still limited due to their dependency on handcrafted features. With the growing availability of large-scale genomic data-sets, deep learning methods, especially convolutional neural networks, have demonstrated excellent power in characterizing sequence patterns in genomic sequences. Thus, we hypothesize that deep learning may bring further improvement for tRNA prediction.

Methods: We proposed a new computational approach based on deep neural networks to predict tRNA gene sequences. We designed and investigated various deep neural network architectures. We used the tRNA sequences as positive samples, and the false-positive tRNA sequences predicted by tRNAscan-SE in coding sequences as negative samples, to train and evaluate the proposed models by comparison with the conventional machine learning methods and popular tRNA prediction tools.

Results: Using the one-hot encoding method, our proposed models can extract features without involving extensive manual feature engineering. Our proposed best model outperformed the existing methods under different performance metrics.

Conclusion: The proposed deep learning methods can substantially reduce the false positive output by the state-of-the-art tool tRNAscan-SE. Coupled with tRNAscan-SE, it can serve as a useful complementary tool for tRNA annotation. The application to tRNA prediction demonstrates the superiority of deep learning in automatic feature generation for characterizing sequence patterns.

tRNA-DL:一种改进tRNAscan-SE预测结果的深度学习方法。
背景:tRNAscan-SE是转移RNA (tRNA)注释的主要工具,在该领域得到了广泛的应用。然而,当应用于大序列时,tRNAscan-SE可能会返回大量的假阳性。最近,人们提出了传统的机器学习方法来解决这个问题,但由于它们依赖于手工制作的特征,它们的效率仍然有限。随着大规模基因组数据集的日益可用性,深度学习方法,特别是卷积神经网络,在表征基因组序列模式方面已经显示出出色的能力。因此,我们假设深度学习可能会进一步提高tRNA的预测能力。方法:提出了一种基于深度神经网络的tRNA基因序列预测方法。我们设计和研究了各种深度神经网络架构。我们将tRNA序列作为阳性样本,将编码序列中tRNAscan-SE预测的假阳性tRNA序列作为阴性样本,通过与传统的机器学习方法和流行的tRNA预测工具进行比较,对所提出的模型进行训练和评估。结果:使用单热编码方法,我们提出的模型可以在不涉及大量人工特征工程的情况下提取特征。我们提出的最佳模型在不同的性能指标下优于现有的方法。结论:所提出的深度学习方法可以通过最先进的工具tRNAscan-SE大幅减少假阳性输出。与tRNAscan-SE结合,可作为tRNA注释的有用补充工具。在tRNA预测中的应用证明了深度学习在序列模式特征自动生成方面的优越性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Human Heredity
Human Heredity 生物-遗传学
CiteScore
2.50
自引率
0.00%
发文量
12
审稿时长
>12 weeks
期刊介绍: Gathering original research reports and short communications from all over the world, ''Human Heredity'' is devoted to methodological and applied research on the genetics of human populations, association and linkage analysis, genetic mechanisms of disease, and new methods for statistical genetics, for example, analysis of rare variants and results from next generation sequencing. The value of this information to many branches of medicine is shown by the number of citations the journal receives in fields ranging from immunology and hematology to epidemiology and public health planning, and the fact that at least 50% of all ''Human Heredity'' papers are still cited more than 8 years after publication (according to ISI Journal Citation Reports). Special issues on methodological topics (such as ‘Consanguinity and Genomics’ in 2014; ‘Analyzing Rare Variants in Complex Diseases’ in 2012) or reviews of advances in particular fields (‘Genetic Diversity in European Populations: Evolutionary Evidence and Medical Implications’ in 2014; ‘Genes and the Environment in Obesity’ in 2013) are published every year. Renowned experts in the field are invited to contribute to these special issues.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信