A Hybrid Machine Learning Approach for Information Extraction

2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06) Pub Date : 2006-12-13 DOI:10.1109/HIS.2006.3

Eduardo F. A. Silva, F. Barros, R. Prudêncio

引用次数: 9

Abstract

Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.

查看原文本刊更多论文

信息提取的混合机器学习方法

信息抽取(Information Extraction, IE)旨在从文本文档中只抽取用户需要的相关数据。在本文中，我们提出了一种结合传统文本分类技术和隐马尔可夫模型(HMM)的半结构化文本的混合机器学习方法。在这种方法中，文本分类器技术生成初始输出，该输出由HMM进行细化，从而提供全局最优提取。使用一个实现的原型从书目参考中提取信息，通过使用HMM获得一致的性能增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)

自引率

0.00%

发文量