{"title":"归纳逻辑程序设计在微阵列数据中的应用方法","authors":"Hiromu Ide, M. Umezawa, H. Ohwada","doi":"10.1145/3156346.3156356","DOIUrl":null,"url":null,"abstract":"This paper describing a method of specifying common terms of genes from microarray data in 3 steps. First, we use random forest for extracting disease-related genes and it give each gene variable importance. The higher the variable importance, the more effective feature for classification. We extract genes whose variable importance more than 0 and set them positive samples and the rest set negative samples for ILP. Next, we annotate extracted genes by using Gene Ontology (GO) and use the term as predicate for ILP. Annotation is the process of assigning GO terms to gene products. Finally, we obtain rules about common terms in positive samples by using ILP. ILP is a subfield of machine learning which uses logic programming as a uniform representation technique for examples, background knowledge and hypotheses. ILP learns based on background knowledge. Background knowledge is represented in first-order logic. In the result, we extracted 1051 mRNA as positive samples for ILP from random forest and its F-measure score was 65.1%. We obtained about 4000 terms at each dataset and use them as predicates for ILP. We got eventually some rules about positive samples.","PeriodicalId":415207,"journal":{"name":"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Proposal of application method of Inductive Logic Programming to microarray data\",\"authors\":\"Hiromu Ide, M. Umezawa, H. Ohwada\",\"doi\":\"10.1145/3156346.3156356\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper describing a method of specifying common terms of genes from microarray data in 3 steps. First, we use random forest for extracting disease-related genes and it give each gene variable importance. The higher the variable importance, the more effective feature for classification. We extract genes whose variable importance more than 0 and set them positive samples and the rest set negative samples for ILP. Next, we annotate extracted genes by using Gene Ontology (GO) and use the term as predicate for ILP. Annotation is the process of assigning GO terms to gene products. Finally, we obtain rules about common terms in positive samples by using ILP. ILP is a subfield of machine learning which uses logic programming as a uniform representation technique for examples, background knowledge and hypotheses. ILP learns based on background knowledge. Background knowledge is represented in first-order logic. In the result, we extracted 1051 mRNA as positive samples for ILP from random forest and its F-measure score was 65.1%. We obtained about 4000 terms at each dataset and use them as predicates for ILP. We got eventually some rules about positive samples.\",\"PeriodicalId\":415207,\"journal\":{\"name\":\"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-12-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3156346.3156356\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 8th International Conference on Computational Systems-Biology and Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3156346.3156356","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Proposal of application method of Inductive Logic Programming to microarray data
This paper describing a method of specifying common terms of genes from microarray data in 3 steps. First, we use random forest for extracting disease-related genes and it give each gene variable importance. The higher the variable importance, the more effective feature for classification. We extract genes whose variable importance more than 0 and set them positive samples and the rest set negative samples for ILP. Next, we annotate extracted genes by using Gene Ontology (GO) and use the term as predicate for ILP. Annotation is the process of assigning GO terms to gene products. Finally, we obtain rules about common terms in positive samples by using ILP. ILP is a subfield of machine learning which uses logic programming as a uniform representation technique for examples, background knowledge and hypotheses. ILP learns based on background knowledge. Background knowledge is represented in first-order logic. In the result, we extracted 1051 mRNA as positive samples for ILP from random forest and its F-measure score was 65.1%. We obtained about 4000 terms at each dataset and use them as predicates for ILP. We got eventually some rules about positive samples.