TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data

Data and Text Mining in Bioinformatics Pub Date : 2014-11-07 DOI:10.1145/2665970.2665992

Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park

{"title":"TILD: A Strategy to Identify Cancer-related Genes Using Title Information in Literature Data","authors":"Jeongwoo Kim, H. Kim, Yunku Yeu, Mincheol Shin, Sanghyun Park","doi":"10.1145/2665970.2665992","DOIUrl":null,"url":null,"abstract":"After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.","PeriodicalId":143937,"journal":{"name":"Data and Text Mining in Bioinformatics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data and Text Mining in Bioinformatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2665970.2665992","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

After genome project in 1990s, researches which are involved with gene have been progressed. These studies unearthed that gene is cause of disease, and relations between gene and disease are important. In this reason, we proposed a strategy called TILD that identifies cancer-related genes using title information in literature data. To implement our method, we selected cancer-specific literature data from the online database. We then extracted genes using text mining. In the next step, we classified into two kinds for extracted genes using title information. If genes are located in title, then they are classified as hub genes. In the contrast, if genes are located in body, then they are classified as sub genes which are connected with hub genes. We iterated the processes for each paper to construct the cancer-specific local gene network. In the last step, we constructed global cancer-specific gene network by integrating all local gene network, and calculated a score for each gene based on analysis of the global gene network. We assumed that genes in title have meaningful relations with cancer, and other genes in the body are related with the title genes. For validation, we compared with other methods for the top 20 genes inferred by each approach. Our approach found more cancer-related genes than comparable methods.

查看原文本刊更多论文

TILD:利用文献数据中的标题信息识别癌症相关基因的策略

20世纪90年代基因组计划后，涉及基因的研究有了新的进展。这些研究揭示了基因是疾病的原因，基因与疾病之间的关系是重要的。因此，我们提出了一种名为TILD的策略，利用文献数据中的标题信息识别癌症相关基因。为了实现我们的方法，我们从在线数据库中选择了癌症特异性文献数据。然后我们使用文本挖掘提取基因。在接下来的步骤中，我们使用标题信息将提取的基因分为两类。如果基因位于标题中，则将其分类为枢纽基因。相反，如果基因位于体内，则将其归类为亚基因，亚基因与枢纽基因相连。我们为每篇论文重复了构建癌症特异性局部基因网络的过程。最后一步，我们通过整合所有局部基因网络构建全球癌症特异性基因网络，并在分析全球基因网络的基础上计算每个基因的得分。我们假设标题中的基因与癌症有意义的关系，而体内的其他基因也与标题基因有关。为了验证，我们将每种方法推断的前20个基因与其他方法进行了比较。我们的方法比同类方法发现了更多的癌症相关基因。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data and Text Mining in Bioinformatics

自引率

0.00%

发文量