信息提取的混合机器学习方法

2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06) Pub Date : 2006-12-13 DOI:10.1109/HIS.2006.3

Eduardo F. A. Silva, F. Barros, R. Prudêncio

{"title":"信息提取的混合机器学习方法","authors":"Eduardo F. A. Silva, F. Barros, R. Prudêncio","doi":"10.1109/HIS.2006.3","DOIUrl":null,"url":null,"abstract":"Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.","PeriodicalId":150732,"journal":{"name":"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":"{\"title\":\"A Hybrid Machine Learning Approach for Information Extraction\",\"authors\":\"Eduardo F. A. Silva, F. Barros, R. Prudêncio\",\"doi\":\"10.1109/HIS.2006.3\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.\",\"PeriodicalId\":150732,\"journal\":{\"name\":\"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"9\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HIS.2006.3\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIS.2006.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

摘要

信息抽取(Information Extraction, IE)旨在从文本文档中只抽取用户需要的相关数据。在本文中，我们提出了一种结合传统文本分类技术和隐马尔可夫模型(HMM)的半结构化文本的混合机器学习方法。在这种方法中，文本分类器技术生成初始输出，该输出由HMM进行细化，从而提供全局最优提取。使用一个实现的原型从书目参考中提取信息，通过使用HMM获得一致的性能增益。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Hybrid Machine Learning Approach for Information Extraction

Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)

自引率

0.00%

发文量