{"title":"A Hybrid Machine Learning Approach for Information Extraction","authors":"Eduardo F. A. Silva, F. Barros, R. Prudêncio","doi":"10.1109/HIS.2006.3","DOIUrl":null,"url":null,"abstract":"Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.","PeriodicalId":150732,"journal":{"name":"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HIS.2006.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Information Extraction (IE) aims to extract from textual documents only the relevant data required by the user. In this paper, we propose a hybrid machine learning approach for IE on semi-structured texts that combines conventional text classification techniques and Hidden Markov Models (HMM). In this approach, a text classifier technique generates an initial output, which is refined by an HMM, providing a globally optimal extraction. An implemented prototype was used to extract information from bibliographic references, reaching a consistent gain in performance through the use of the HMM.