Information Extraction

Handbook of Natural Language Processing Pub Date : 1900-01-01 DOI:10.1201/9781420085938-c21

Jerry R. Hobbs

{"title":"Information Extraction","authors":"Jerry R. Hobbs","doi":"10.1201/9781420085938-c21","DOIUrl":null,"url":null,"abstract":"Information Extraction (IE) techniques aim to extract the names of entities and objects from text and to identify the roles that they play in event descriptions. IE systems generally focus on a specific domain or topic, searching only for information that is relevant to a user's interests. In this chapter, we first give historical background on information extraction and discuss several kinds of information extraction tasks that have emerged in recent years. Next, we outline the series of steps that are involved in creating a typical information extraction system, which can be encoded as a cascaded finite-state transducer. Along the way, we present examples to illustrate what each step does. Finally, we present an overview of different learning-based methods for information extraction, including supervised learning approaches, weakly supervised and bootstrapping techniques, and discourse-oriented approaches. Information extraction (IE) is the process of scanning text for information relevant to some interest, including extracting entities, relations, and, most challenging, events–or who did what to whom when and where. It requires deeper analysis than key word searches, but its aims fall short of the very hard and long-term problem of text understanding, where we seek to capture all the information in a text, along with the speaker's or writer's intention.","PeriodicalId":361311,"journal":{"name":"Handbook of Natural Language Processing","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Handbook of Natural Language Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1201/9781420085938-c21","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Information Extraction (IE) techniques aim to extract the names of entities and objects from text and to identify the roles that they play in event descriptions. IE systems generally focus on a specific domain or topic, searching only for information that is relevant to a user's interests. In this chapter, we first give historical background on information extraction and discuss several kinds of information extraction tasks that have emerged in recent years. Next, we outline the series of steps that are involved in creating a typical information extraction system, which can be encoded as a cascaded finite-state transducer. Along the way, we present examples to illustrate what each step does. Finally, we present an overview of different learning-based methods for information extraction, including supervised learning approaches, weakly supervised and bootstrapping techniques, and discourse-oriented approaches. Information extraction (IE) is the process of scanning text for information relevant to some interest, including extracting entities, relations, and, most challenging, events–or who did what to whom when and where. It requires deeper analysis than key word searches, but its aims fall short of the very hard and long-term problem of text understanding, where we seek to capture all the information in a text, along with the speaker's or writer's intention.

查看原文本刊更多论文

信息提取

信息提取(Information Extraction, IE)技术旨在从文本中提取实体和对象的名称，并确定它们在事件描述中所扮演的角色。IE系统通常关注特定的领域或主题，只搜索与用户兴趣相关的信息。在本章中，我们首先给出了信息抽取的历史背景，并讨论了近年来出现的几种信息抽取任务。接下来，我们概述了创建一个典型的信息提取系统所涉及的一系列步骤，该系统可以编码为级联有限状态传感器。在此过程中，我们将提供示例来说明每个步骤的作用。最后，我们概述了不同的基于学习的信息提取方法，包括监督学习方法、弱监督和自举技术以及面向话语的方法。信息提取(IE)是扫描文本以获取与某些兴趣相关的信息的过程，包括提取实体、关系和(最具挑战性的)事件——或者谁在何时何地对谁做了什么。它需要比关键词搜索更深入的分析，但它的目标没有达到文本理解这个非常困难和长期的问题，在文本理解中，我们试图捕捉文本中的所有信息，以及说话者或作者的意图。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Handbook of Natural Language Processing

自引率

0.00%

发文量