归纳逻辑程序设计在微阵列数据中的应用方法

Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics Pub Date : 2017-12-07 DOI:10.1145/3156346.3156356

Hiromu Ide, M. Umezawa, H. Ohwada

{"title":"归纳逻辑程序设计在微阵列数据中的应用方法","authors":"Hiromu Ide, M. Umezawa, H. Ohwada","doi":"10.1145/3156346.3156356","DOIUrl":null,"url":null,"abstract":"This paper describing a method of specifying common terms of genes from microarray data in 3 steps. First, we use random forest for extracting disease-related genes and it give each gene variable importance. The higher the variable importance, the more effective feature for classification. We extract genes whose variable importance more than 0 and set them positive samples and the rest set negative samples for ILP. Next, we annotate extracted genes by using Gene Ontology (GO) and use the term as predicate for ILP. Annotation is the process of assigning GO terms to gene products. Finally, we obtain rules about common terms in positive samples by using ILP. ILP is a subfield of machine learning which uses logic programming as a uniform representation technique for examples, background knowledge and hypotheses. ILP learns based on background knowledge. Background knowledge is represented in first-order logic. In the result, we extracted 1051 mRNA as positive samples for ILP from random forest and its F-measure score was 65.1%. We obtained about 4000 terms at each dataset and use them as predicates for ILP. We got eventually some rules about positive samples.","PeriodicalId":415207,"journal":{"name":"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Proposal of application method of Inductive Logic Programming to microarray data\",\"authors\":\"Hiromu Ide, M. Umezawa, H. Ohwada\",\"doi\":\"10.1145/3156346.3156356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describing a method of specifying common terms of genes from microarray data in 3 steps. First, we use random forest for extracting disease-related genes and it give each gene variable importance. The higher the variable importance, the more effective feature for classification. We extract genes whose variable importance more than 0 and set them positive samples and the rest set negative samples for ILP. Next, we annotate extracted genes by using Gene Ontology (GO) and use the term as predicate for ILP. Annotation is the process of assigning GO terms to gene products. Finally, we obtain rules about common terms in positive samples by using ILP. ILP is a subfield of machine learning which uses logic programming as a uniform representation technique for examples, background knowledge and hypotheses. ILP learns based on background knowledge. Background knowledge is represented in first-order logic. In the result, we extracted 1051 mRNA as positive samples for ILP from random forest and its F-measure score was 65.1%. We obtained about 4000 terms at each dataset and use them as predicates for ILP. We got eventually some rules about positive samples.\",\"PeriodicalId\":415207,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3156346.3156356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3156346.3156356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

本文描述了一种从微阵列数据中分三步确定基因共同术语的方法。首先，我们使用随机森林提取疾病相关基因，并赋予每个基因变量的重要性。变量重要度越高，特征对分类越有效。我们提取变量重要度大于0的基因作为ILP的阳性样本，其余的作为阴性样本。接下来，我们使用基因本体(Gene Ontology, GO)对提取的基因进行注释，并使用该术语作为ILP的谓词。注释是将GO术语分配给基因产物的过程。最后，我们利用ILP方法得到了阳性样本中公共项的规则。ILP是机器学习的一个子领域，它使用逻辑编程作为示例、背景知识和假设的统一表示技术。ILP基于背景知识进行学习。背景知识用一阶逻辑表示。结果，我们从随机森林中提取了1051个mRNA作为ILP阳性样本，其F-measure得分为65.1%。我们在每个数据集中获得了大约4000个术语，并将它们用作ILP的谓词。我们最终得到了一些关于阳性样本的规则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Proposal of application method of Inductive Logic Programming to microarray data

This paper describing a method of specifying common terms of genes from microarray data in 3 steps. First, we use random forest for extracting disease-related genes and it give each gene variable importance. The higher the variable importance, the more effective feature for classification. We extract genes whose variable importance more than 0 and set them positive samples and the rest set negative samples for ILP. Next, we annotate extracted genes by using Gene Ontology (GO) and use the term as predicate for ILP. Annotation is the process of assigning GO terms to gene products. Finally, we obtain rules about common terms in positive samples by using ILP. ILP is a subfield of machine learning which uses logic programming as a uniform representation technique for examples, background knowledge and hypotheses. ILP learns based on background knowledge. Background knowledge is represented in first-order logic. In the result, we extracted 1051 mRNA as positive samples for ILP from random forest and its F-measure score was 65.1%. We obtained about 4000 terms at each dataset and use them as predicates for ILP. We got eventually some rules about positive samples.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics

自引率

0.00%

发文量