基于文本挖掘的疾病相关基因排序方法

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2010-12-01 DOI:10.1109/BIBM.2010.5706616

Hyungmin Lee, Miyoung Shin, Munpyo Hong

{"title":"基于文本挖掘的疾病相关基因排序方法","authors":"Hyungmin Lee, Miyoung Shin, Munpyo Hong","doi":"10.1109/BIBM.2010.5706616","DOIUrl":null,"url":null,"abstract":"For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.","PeriodicalId":275098,"journal":{"name":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"A gene ranking method using text-mining for the identification of disease related genes\",\"authors\":\"Hyungmin Lee, Miyoung Shin, Munpyo Hong\",\"doi\":\"10.1109/BIBM.2010.5706616\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.\",\"PeriodicalId\":275098,\"journal\":{\"name\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2010.5706616\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2010.5706616","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

为了鉴定与特定疾病相关的重要基因，微阵列基因表达谱已被广泛用于确定候选基因的优先级。在本文中，我们提出了一种新的基因排序方法，该方法利用从文献中提取的基因关系以及从微阵列中获得的基因表达评分。本文采用结合句法分析和共现方法的混合方法提取基因关系。具体来说，我们对文本进行语法解析，然后在解析句子的每个子句中，认为共同出现的基因名称是相互关联的。通过上述方法得到的基因-基因关系得到的基因网络和基因表达得分作为GeneRank算法的输入。为了评估我们的方法，我们用公开的前列腺癌数据进行了实验。结果表明，该方法在查准率和查全率上均优于基于基因本体标注构建的基因-基因关系的GeneRank方法。此外，我们的基因-基因关系提取的混合方法比现有流行的共现方法更好地优先考虑真正与疾病相关的基因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A gene ranking method using text-mining for the identification of disease related genes

For the identification of significant genes involved in specific diseases, microarray gene expression profiles have been widely used to prioritize candidate genes. In this paper, we propose a new gene ranking method that employs genegene relations extracted from literature along with gene expression scores obtained from microarrays. Here the genegene relations are extracted by taking a hybrid approach which is a combination of syntactic analysis and co-occurrence based approaches. Specifically, we perform the syntactic parsing on the text and then, within each clause of the parsed sentence, the co-occurred gene names are considered to be mutually related. Both the gene network derived from the gene-gene relations obtained in the above way and the gene expression scores are given as the inputs to the GeneRank algorithm. For the evaluation of our approach, we conducted experiments with the publicly available prostate cancer data. The results show that our method is superior in the precision and the recall to the original GeneRank which employs the gene-gene relations built from gene ontology annotations. Furthermore, our hybrid approach to the gene-gene relation extraction produces better prioritization of truly disease-related genes in top ranks than the existing popular co-occurrence approach.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量