A multi-level text mining method to extract biological relationships.

Proceedings. IEEE Computer Society Bioinformatics Conference Pub Date : 2002-01-01

Mathew Palakal, Matthew Stephens, Snehasis Mukhopadhyay, Rajeev Raje, Simon Rhodes

引用次数: 0

Abstract

Accurate and computationally efficient approaches in discovering relationships between biological objects from text documents are important for biologists to develop biological models. This paper presents a novel approach to extract relationships between multiple biological objects that are present in a text document. The approach involves object identification, reference resolution, ontology and synonym discovery, and extracting object-object relationships. Hidden Markov Models (HMMs), dictionaries, and N-Gram models are used to set the framework to tackle the complex task of extracting object-object relationships. Experiments were carried out using a corpus of one thousand Medline abstracts. Intermediate results were obtained for the object identification process, synonym discovery, and finally the relationship extraction. For a corpus of thousand abstracts, 53 relationships were extracted of which 43 were correct, giving a specificity of 81%. The approach is both adaptable and scalable to new problems as opposed to rule-based methods.

本刊更多论文

一种提取生物关系的多层次文本挖掘方法。

从文本文档中发现生物对象之间关系的准确和计算效率的方法对于生物学家开发生物模型非常重要。本文提出了一种新的方法来提取存在于文本文档中的多个生物对象之间的关系。该方法包括对象识别、引用解析、本体和同义词发现以及对象-对象关系的提取。隐马尔可夫模型(hmm)、字典和N-Gram模型用于设置框架，以处理提取对象-对象关系的复杂任务。实验是用1000篇Medline摘要的语料库进行的。从对象识别到同义词发现，再到关系提取，得到中间结果。对于数千个摘要的语料库，提取了53种关系，其中43种是正确的，特异性为81%。与基于规则的方法不同，该方法对新问题具有可适应性和可伸缩性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. IEEE Computer Society Bioinformatics Conference

自引率

0.00%

发文量