Identifying essential features for the classification of real and pseudo microRNAs precursors using fuzzy decision trees

Na'el Abu-halaweh, R. Harrison
{"title":"Identifying essential features for the classification of real and pseudo microRNAs precursors using fuzzy decision trees","authors":"Na'el Abu-halaweh, R. Harrison","doi":"10.1109/CIBCB.2010.5510430","DOIUrl":null,"url":null,"abstract":"MicroRNAs play an important role in post-transcriptional gene regulation. Experimental approaches to identify microRNAs are expensive and time-consuming. Computational approaches have proven to be useful for identifying microRNA candidates. Most approaches rely on features extracted from miroRNA precursors (pre-microRNA) and their secondary structure. Selecting the appropriate set of features plays a critical role in improving the prediction accuracy of pre-microRNA candidates. This work aims to investigate the triplet elements encoding scheme and to identify essential features needed for the correct classification of pre-microRNAs. To achieve these goals, an extension of the triplet elements encoding scheme is introduced. Features extracted using the extended scheme were combined with global features introduced in the literature, and fuzzy decision tree (FDT) is used as a classification and a feature selection tool. Unlike previous machine-learning-based approaches, FDT produces a human comprehensible classification model. The interpretability of the classification model provides a means to identify the essential features needed to recognize microRNA candidates and offers a better understanding of this problem. Our results indicate that the triplet elements scheme is not superior to any of its proposed extensions. Further analysis revealed that including the features extracted using triplet elements scheme does not add any value to this classification problem but rather introduces some noisy features, and comparable classification results can be achieved by using only the six global features identified by FDT.","PeriodicalId":340637,"journal":{"name":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-05-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CIBCB.2010.5510430","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7

Abstract

MicroRNAs play an important role in post-transcriptional gene regulation. Experimental approaches to identify microRNAs are expensive and time-consuming. Computational approaches have proven to be useful for identifying microRNA candidates. Most approaches rely on features extracted from miroRNA precursors (pre-microRNA) and their secondary structure. Selecting the appropriate set of features plays a critical role in improving the prediction accuracy of pre-microRNA candidates. This work aims to investigate the triplet elements encoding scheme and to identify essential features needed for the correct classification of pre-microRNAs. To achieve these goals, an extension of the triplet elements encoding scheme is introduced. Features extracted using the extended scheme were combined with global features introduced in the literature, and fuzzy decision tree (FDT) is used as a classification and a feature selection tool. Unlike previous machine-learning-based approaches, FDT produces a human comprehensible classification model. The interpretability of the classification model provides a means to identify the essential features needed to recognize microRNA candidates and offers a better understanding of this problem. Our results indicate that the triplet elements scheme is not superior to any of its proposed extensions. Further analysis revealed that including the features extracted using triplet elements scheme does not add any value to this classification problem but rather introduces some noisy features, and comparable classification results can be achieved by using only the six global features identified by FDT.
利用模糊决策树识别真实和伪microrna前体分类的基本特征
MicroRNAs在转录后基因调控中发挥着重要作用。鉴定微小rna的实验方法既昂贵又耗时。计算方法已被证明对鉴定候选microRNA是有用的。大多数方法依赖于从microrna前体(pre-microRNA)及其二级结构中提取的特征。选择合适的特征集对于提高pre-microRNA候选物的预测准确性起着至关重要的作用。这项工作旨在研究三重元件编码方案,并确定正确分类前microrna所需的基本特征。为了实现这些目标,引入了一种扩展的三元元编码方案。利用扩展方案提取的特征与文献中引入的全局特征相结合,并使用模糊决策树(FDT)作为分类和特征选择工具。与之前基于机器学习的方法不同,FDT产生了一个人类可理解的分类模型。分类模型的可解释性提供了一种方法来识别识别候选microRNA所需的基本特征,并提供了对这一问题的更好理解。我们的结果表明,三元元方案并不优于其提出的任何扩展。进一步分析表明,将三元元提取的特征加入到该分类问题中,不仅没有增加任何价值,反而引入了一些有噪声的特征,仅使用FDT识别的6个全局特征就可以获得比较的分类结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信