基于简单分类器的预测基因鲁棒选择。

Veronica Vinciotti, Allan Tucker, Paul Kellam, Xiaohui Liu
{"title":"基于简单分类器的预测基因鲁棒选择。","authors":"Veronica Vinciotti,&nbsp;Allan Tucker,&nbsp;Paul Kellam,&nbsp;Xiaohui Liu","doi":"10.2165/00822942-200605010-00001","DOIUrl":null,"url":null,"abstract":"<p><p>Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works. This in turn may lead to better diagnoses and potentially could lead to a cure for that disease. This task becomes extremely challenging when the data are characterised by only a small number of samples and a high number of dimensions, as is often the case with gene expression data. Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for robust identification of the most predictive features in such data. Within this framework, we propose a simple selective naive Bayes classifier discovered using a global search technique, and combine it with data perturbation to increase its robustness for small sample sizes. An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all confounded by small sample sizes and high dimensionality. The method has been shown to be capable of selecting genes known to be associated with prostate cancer and viral infections.</p>","PeriodicalId":87049,"journal":{"name":"Applied bioinformatics","volume":"5 1","pages":"1-11"},"PeriodicalIF":0.0000,"publicationDate":"2006-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.2165/00822942-200605010-00001","citationCount":"9","resultStr":"{\"title\":\"Robust Selection of Predictive Genes via a Simple Classifier.\",\"authors\":\"Veronica Vinciotti,&nbsp;Allan Tucker,&nbsp;Paul Kellam,&nbsp;Xiaohui Liu\",\"doi\":\"10.2165/00822942-200605010-00001\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works. This in turn may lead to better diagnoses and potentially could lead to a cure for that disease. This task becomes extremely challenging when the data are characterised by only a small number of samples and a high number of dimensions, as is often the case with gene expression data. Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for robust identification of the most predictive features in such data. Within this framework, we propose a simple selective naive Bayes classifier discovered using a global search technique, and combine it with data perturbation to increase its robustness for small sample sizes. An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all confounded by small sample sizes and high dimensionality. The method has been shown to be capable of selecting genes known to be associated with prostate cancer and viral infections.</p>\",\"PeriodicalId\":87049,\"journal\":{\"name\":\"Applied bioinformatics\",\"volume\":\"5 1\",\"pages\":\"1-11\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.2165/00822942-200605010-00001\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Applied bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.2165/00822942-200605010-00001\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Applied bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2165/00822942-200605010-00001","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

摘要

从表达数据中识别指导疾病机制的基因对于理解该机制如何运作非常有用。这反过来可能会导致更好的诊断,并有可能导致治愈这种疾病。当数据的特征只有少量样本和大量维度时,这项任务变得极具挑战性,就像基因表达数据经常出现的情况一样。在这一挑战的激励下,我们提出了一个关注简单性和数据扰动的通用框架。这些是在此类数据中可靠识别最具预测性特征的关键。在这个框架内,我们提出了一个使用全局搜索技术发现的简单的选择性朴素贝叶斯分类器,并将其与数据扰动相结合,以增加其对小样本量的鲁棒性。使用来自微阵列领域的两个应用数据集和一个模拟数据集对该方法进行了广泛的验证,这些数据集都是由小样本量和高维混淆的。该方法已被证明能够选择已知与前列腺癌和病毒感染相关的基因。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Robust Selection of Predictive Genes via a Simple Classifier.

Identifying genes that direct the mechanism of a disease from expression data is extremely useful in understanding how that mechanism works. This in turn may lead to better diagnoses and potentially could lead to a cure for that disease. This task becomes extremely challenging when the data are characterised by only a small number of samples and a high number of dimensions, as is often the case with gene expression data. Motivated by this challenge, we present a general framework that focuses on simplicity and data perturbation. These are the keys for robust identification of the most predictive features in such data. Within this framework, we propose a simple selective naive Bayes classifier discovered using a global search technique, and combine it with data perturbation to increase its robustness for small sample sizes. An extensive validation of the method was carried out using two applied datasets from the field of microarrays and a simulated dataset, all confounded by small sample sizes and high dimensionality. The method has been shown to be capable of selecting genes known to be associated with prostate cancer and viral infections.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信