基于有趣的非冗余对比序列规则寻找新的诊断基因模式

2011 IEEE 11th International Conference on Data Mining Pub Date : 2011-12-11 DOI:10.1109/ICDM.2011.68

Yuhai Zhao, Guoren Wang, Yuan Li, Zhanghui Wang

{"title":"基于有趣的非冗余对比序列规则寻找新的诊断基因模式","authors":"Yuhai Zhao, Guoren Wang, Yuan Li, Zhanghui Wang","doi":"10.1109/ICDM.2011.68","DOIUrl":null,"url":null,"abstract":"Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the \"noise\" universal in the real data, (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes, (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent row enumeration, it performs a novel template-driven enumeration by making use of the special characteristic of micro array data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude, (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.","PeriodicalId":106216,"journal":{"name":"2011 IEEE 11th International Conference on Data Mining","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Finding Novel Diagnostic Gene Patterns Based on Interesting Non-redundant Contrast Sequence Rules\",\"authors\":\"Yuhai Zhao, Guoren Wang, Yuan Li, Zhanghui Wang\",\"doi\":\"10.1109/ICDM.2011.68\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the \\\"noise\\\" universal in the real data, (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes, (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent row enumeration, it performs a novel template-driven enumeration by making use of the special characteristic of micro array data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude, (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.\",\"PeriodicalId\":106216,\"journal\":{\"name\":\"2011 IEEE 11th International Conference on Data Mining\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-12-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE 11th International Conference on Data Mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2011.68\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 11th International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2011.68","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

诊断基因是指与特定疾病表型密切相关的基因，其区分不同类别的能力往往很高。大多数发现强大诊断基因的方法要么是基于单例判别性，要么是基于组合判别性。然而，两者都忽略了基因之间丰富的相互作用，这在现实世界中广泛存在。在本文中，我们从一个新的角度来解决这个问题，并做出以下贡献:(1)我们提出了一个EWave模型，该模型基于定义的等效维群序列，考虑到实际数据中普遍存在的“噪声”，有效地利用了基因之间的有序表达;(2)我们设计了一种新的序列规则，即有趣的非冗余对比序列规则，它能够使用尽可能少的基因以高精度捕获不同表型之间的差异;(3)我们提出了一种称为NRMINER的高效算法来寻找这些规则。与传统的列枚举和最近的行枚举不同，它通过利用EWave建模的微阵列数据的特殊特性来执行一种新颖的模板驱动枚举。在各种合成和真实数据集上进行的大量实验表明:(1)NRMINER比竞争算法显著快一个数量级，(2)使用更少的基因提供更高的精度。NRMINER发现的许多诊断基因被证明与某些疾病具有生物学相关性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Finding Novel Diagnostic Gene Patterns Based on Interesting Non-redundant Contrast Sequence Rules

Diagnostic genes refer to the genes closely related to a specific disease phenotype, the powers of which to distinguish between different classes are often high. Most methods to discovering the powerful diagnostic genes are either singleton discriminability-based or combination discriminability-based. However, both ignore the abundant interactions among genes, which widely exist in the real world. In this paper, we tackle the problem from a new point of view and make the following contributions: (1) we propose an EWave model, which profitably exploits the ordered expressions among genes based on the defined equivalent dimension group sequences taking into account the "noise" universal in the real data, (2) we devise a novel sequence rule, namely interesting non-redundant contrast sequence rule, which is able to capture the difference between different phenotypes in a high accuracy using as few as possible genes, (3) we present an efficient algorithm called NRMINER to find such rules. Unlike the conventional column enumeration and the more recent row enumeration, it performs a novel template-driven enumeration by making use of the special characteristic of micro array data modeled by EWave. Extensive experiments conducted on various synthetic and real datasets show that: (1) NRMINER is significantly faster than the competing algorithm by up to about one order of magnitude, (2) it provides a higher accuracy using fewer genes. Many diagnostic genes discovered by NRMINER are proved biologically related to some disease.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE 11th International Conference on Data Mining

自引率

0.00%

发文量