Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.

IF 1.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY
Journal of Nucleic Acids Pub Date : 2012-01-01 Epub Date: 2012-11-07 DOI:10.1155/2012/652979
Philip H Williams, Rod Eyles, Georg Weiller
{"title":"Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.","authors":"Philip H Williams, Rod Eyles, Georg Weiller","doi":"10.1155/2012/652979","DOIUrl":null,"url":null,"abstract":"<p><p>MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require \"read count\" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.</p>","PeriodicalId":16575,"journal":{"name":"Journal of Nucleic Acids","volume":"2012 ","pages":"652979"},"PeriodicalIF":1.3000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3503367/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nucleic Acids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2012/652979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/11/7 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

Abstract Image

Abstract Image

Abstract Image

通过使用 C5.0 决策树的监督机器学习进行植物 MicroRNA 预测。
微小核糖核酸(miRNA)是长度在 20 到 22 个核苷酸之间的非蛋白质编码 RNA,可减少蛋白质的产生。目前正在研究不同类型的序列数据,包括基因组和转录组序列,以寻找新型 miRNA。多种机器学习方法已成功预测了 miRNA 前体、成熟 miRNA 和其他非蛋白编码序列。MirTools、mirDeep2 和 miRanalyzer 要求在输入序列中包含 "读数",这限制了它们在深度测序数据中的应用。我们的目标是利用不同物种的横截面来训练一个预测器,以准确预测训练集之外的 miRNA。我们希望这个系统在预测时不需要读数,因此可以应用于从基因组、EST 或 RNA-seq 数据源中提取的短序列。我们通过监督机器学习开发了一种 miRNA 预测决策树模型。它只需要在包括候选前体的序列窗口内有相应的基因组或转录组,这样就能收集到所需的序列特征。训练预测器的一些最关键特征是 miRNA:miRNA(∗) 双链的能量和双链中错配的数量。我们介绍了一种跨物种植物 miRNA 预测器,该预测器的灵敏度为 84.08%,特异性为 98.53%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Nucleic Acids
Journal of Nucleic Acids BIOCHEMISTRY & MOLECULAR BIOLOGY-
CiteScore
3.10
自引率
21.70%
发文量
5
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信