{"title":"基于文本挖掘的疾病相关基因排序方法","authors":"Hyungmin Lee, Miyoung Shin, Munpyo Hong","doi":"10.1109/BIBM.2010.5706616","DOIUrl":null,"url":null,"abstract":"For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A gene ranking method using text-mining for the identification of disease related genes\",\"authors\":\"Hyungmin Lee, Miyoung Shin, Munpyo Hong\",\"doi\":\"10.1109/BIBM.2010.5706616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.\",\"PeriodicalId\":275098,\"journal\":{\"name\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2010.5706616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2010.5706616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A gene ranking method using text-mining for the identification of disease related genes
For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.