基因表达数据分析中整合理论和数据驱动的特征选择。

Vineet K Raghu, Xiaoyu Ge, Panos K Chrysanthis, Panayiotis V Benos
{"title":"基因表达数据分析中整合理论和数据驱动的特征选择。","authors":"Vineet K Raghu, Xiaoyu Ge, Panos K Chrysanthis, Panayiotis V Benos","doi":"10.1109/ICDE.2017.223","DOIUrl":null,"url":null,"abstract":"<p><p>The exponential growth of high dimensional biological data has led to a rapid increase in demand for automated approaches for knowledge production. Existing methods rely on two general approaches to address this challenge: 1) the Theory-driven approach, which utilizes prior accumulated knowledge, and 2) the Data-driven approach, which solely utilizes the data to deduce scientific knowledge. Both of these approaches alone suffer from bias toward past/present knowledge, as they fail to incorporate all of the current knowledge that is available to make new discoveries. In this paper, we show how an integrated method can effectively address the high dimensionality of big biological data, which is a major problem for pure data-driven analysis approaches. We realize our approach in a novel two-step analytical workflow that incorporates a new feature selection paradigm as the first step to handling high-throughput gene expression data analysis and that utilizes graphical causal modeling as the second step to handle the automatic extraction of causal relationships. Our results, on real-world clinical datasets from The Cancer Genome Atlas (TCGA), demonstrate that our method is capable of intelligently selecting genes for learning effective causal networks.</p>","PeriodicalId":74570,"journal":{"name":"Proceedings. International Conference on Data Engineering","volume":"2017 ","pages":"1525-1532"},"PeriodicalIF":0.0000,"publicationDate":"2017-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5799807/pdf/nihms937517.pdf","citationCount":"0","resultStr":"{\"title\":\"Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis.\",\"authors\":\"Vineet K Raghu, Xiaoyu Ge, Panos K Chrysanthis, Panayiotis V Benos\",\"doi\":\"10.1109/ICDE.2017.223\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>The exponential growth of high dimensional biological data has led to a rapid increase in demand for automated approaches for knowledge production. Existing methods rely on two general approaches to address this challenge: 1) the Theory-driven approach, which utilizes prior accumulated knowledge, and 2) the Data-driven approach, which solely utilizes the data to deduce scientific knowledge. Both of these approaches alone suffer from bias toward past/present knowledge, as they fail to incorporate all of the current knowledge that is available to make new discoveries. In this paper, we show how an integrated method can effectively address the high dimensionality of big biological data, which is a major problem for pure data-driven analysis approaches. We realize our approach in a novel two-step analytical workflow that incorporates a new feature selection paradigm as the first step to handling high-throughput gene expression data analysis and that utilizes graphical causal modeling as the second step to handle the automatic extraction of causal relationships. Our results, on real-world clinical datasets from The Cancer Genome Atlas (TCGA), demonstrate that our method is capable of intelligently selecting genes for learning effective causal networks.</p>\",\"PeriodicalId\":74570,\"journal\":{\"name\":\"Proceedings. International Conference on Data Engineering\",\"volume\":\"2017 \",\"pages\":\"1525-1532\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5799807/pdf/nihms937517.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. International Conference on Data Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDE.2017.223\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2017/5/18 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2017.223","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2017/5/18 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

高维生物数据的指数级增长导致对知识生产自动化方法的需求迅速增加。现有方法依赖于两种一般方法来应对这一挑战:1)理论驱动方法,利用先前积累的知识;2)数据驱动方法,仅利用数据推断科学知识。这两种方法都存在对过去/现在知识的偏见,因为它们无法将所有可用的当前知识纳入新发现中。在本文中,我们展示了一种集成方法如何有效地解决大生物数据的高维问题,这是纯数据驱动分析方法的一个主要问题。我们在一种新的两步分析工作流程中实现了我们的方法,该工作流程将新的特征选择范式作为处理高通量基因表达数据分析的第一步,并利用图形因果建模作为处理因果关系自动提取的第二步。我们在来自癌症基因组图谱(TCGA)的真实临床数据集上的研究结果表明,我们的方法能够智能地选择基因以学习有效的因果网络。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis.

Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis.

Integrated Theory- and Data-driven Feature Selection in Gene Expression Data Analysis.

The exponential growth of high dimensional biological data has led to a rapid increase in demand for automated approaches for knowledge production. Existing methods rely on two general approaches to address this challenge: 1) the Theory-driven approach, which utilizes prior accumulated knowledge, and 2) the Data-driven approach, which solely utilizes the data to deduce scientific knowledge. Both of these approaches alone suffer from bias toward past/present knowledge, as they fail to incorporate all of the current knowledge that is available to make new discoveries. In this paper, we show how an integrated method can effectively address the high dimensionality of big biological data, which is a major problem for pure data-driven analysis approaches. We realize our approach in a novel two-step analytical workflow that incorporates a new feature selection paradigm as the first step to handling high-throughput gene expression data analysis and that utilizes graphical causal modeling as the second step to handle the automatic extraction of causal relationships. Our results, on real-world clinical datasets from The Cancer Genome Atlas (TCGA), demonstrate that our method is capable of intelligently selecting genes for learning effective causal networks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
6.10
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信