Constructing biological knowledge bases by extracting information from text sources.

Proceedings. International Conference on Intelligent Systems for Molecular Biology Pub Date : 1999-01-01

M Craven, J Kumlien

引用次数: 0

Abstract

Recently, there has been much effort in making databases for molecular biology more accessible and interoperable. However, information in text form, such as MEDLINE records, remains a greatly underutilized source of biological information. We have begun a research effort aimed at automatically mapping information from text sources into structured representations, such as knowledge bases. Our approach to this task is to use machine-learning methods to induce routines for extracting facts from text. We describe two learning methods that we have applied to this task--a statistical text classification method, and a relational learning method--and our initial experiments in learning such information-extraction routines. We also present an approach to decreasing the cost of learning information-extraction routines by learning from "weakly" labeled training data.

本刊更多论文

从文本源中提取信息构建生物知识库。

最近，人们在使分子生物学数据库更易于访问和互操作方面做了很多努力。然而，文本形式的信息，如MEDLINE记录，仍然是一个未充分利用的生物信息来源。我们已经开始了一项研究工作，旨在将文本源中的信息自动映射到结构化表示中，例如知识库。我们完成这项任务的方法是使用机器学习方法来归纳从文本中提取事实的例程。我们描述了我们应用于此任务的两种学习方法——统计文本分类方法和关系学习方法——以及我们在学习此类信息提取例程方面的初步实验。我们还提出了一种通过学习“弱”标记训练数据来降低学习信息提取例程成本的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Conference on Intelligent Systems for Molecular Biology

自引率

0.00%

发文量