A Hybrid Machine Learning Approach for Information Extraction

Eduardo F. A. Silva, F. Barros, R. Prudêncio
{"title":"A Hybrid Machine Learning Approach for Information Extraction","authors":"Eduardo F. A. Silva, F. Barros, R. Prudêncio","doi":"10.1109/HIS.2006.3","DOIUrl":null,"url":null,"abstract":"Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.","PeriodicalId":150732,"journal":{"name":"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIS.2006.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.
信息提取的混合机器学习方法
信息抽取(Information Extraction, IE)旨在从文本文档中只抽取用户需要的相关数据。在本文中,我们提出了一种结合传统文本分类技术和隐马尔可夫模型(HMM)的半结构化文本的混合机器学习方法。在这种方法中,文本分类器技术生成初始输出,该输出由HMM进行细化,从而提供全局最优提取。使用一个实现的原型从书目参考中提取信息,通过使用HMM获得一致的性能增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信