Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.

IF 1.3 Q4 BIOCHEMISTRY & MOLECULAR BIOLOGY

Journal of Nucleic Acids Pub Date : 2012-01-01 Epub Date: 2012-11-07 DOI:10.1155/2012/652979

Philip H Williams, Rod Eyles, Georg Weiller

{"title":"Plant MicroRNA Prediction by Supervised Machine Learning Using C5.0 Decision Trees.","authors":"Philip H Williams, Rod Eyles, Georg Weiller","doi":"10.1155/2012/652979","DOIUrl":null,"url":null,"abstract":"<p><p>MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require \"read count\" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.</p>","PeriodicalId":16575,"journal":{"name":"Journal of Nucleic Acids","volume":"2012 ","pages":"652979"},"PeriodicalIF":1.3000,"publicationDate":"2012-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3503367/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Nucleic Acids","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1155/2012/652979","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2012/11/7 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

MicroRNAs (miRNAs) are nonprotein coding RNAs between 20 and 22 nucleotides long that attenuate protein production. Different types of sequence data are being investigated for novel miRNAs, including genomic and transcriptomic sequences. A variety of machine learning methods have successfully predicted miRNA precursors, mature miRNAs, and other nonprotein coding sequences. MirTools, mirDeep2, and miRanalyzer require "read count" to be included with the input sequences, which restricts their use to deep-sequencing data. Our aim was to train a predictor using a cross-section of different species to accurately predict miRNAs outside the training set. We wanted a system that did not require read-count for prediction and could therefore be applied to short sequences extracted from genomic, EST, or RNA-seq sources. A miRNA-predictive decision-tree model has been developed by supervised machine learning. It only requires that the corresponding genome or transcriptome is available within a sequence window that includes the precursor candidate so that the required sequence features can be collected. Some of the most critical features for training the predictor are the miRNA:miRNA(∗) duplex energy and the number of mismatches in the duplex. We present a cross-species plant miRNA predictor with 84.08% sensitivity and 98.53% specificity based on rigorous testing by leave-one-out validation.

Abstract Image

查看原文本刊更多论文

通过使用 C5.0 决策树的监督机器学习进行植物 MicroRNA 预测。

微小核糖核酸（miRNA）是长度在 20 到 22 个核苷酸之间的非蛋白质编码 RNA，可减少蛋白质的产生。目前正在研究不同类型的序列数据，包括基因组和转录组序列，以寻找新型 miRNA。多种机器学习方法已成功预测了 miRNA 前体、成熟 miRNA 和其他非蛋白编码序列。MirTools、mirDeep2 和 miRanalyzer 要求在输入序列中包含 "读数"，这限制了它们在深度测序数据中的应用。我们的目标是利用不同物种的横截面来训练一个预测器，以准确预测训练集之外的 miRNA。我们希望这个系统在预测时不需要读数，因此可以应用于从基因组、EST 或 RNA-seq 数据源中提取的短序列。我们通过监督机器学习开发了一种 miRNA 预测决策树模型。它只需要在包括候选前体的序列窗口内有相应的基因组或转录组，这样就能收集到所需的序列特征。训练预测器的一些最关键特征是 miRNA:miRNA(∗) 双链的能量和双链中错配的数量。我们介绍了一种跨物种植物 miRNA 预测器，该预测器的灵敏度为 84.08%，特异性为 98.53%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊