Extracting Useful Information from the Full Text of Fiction

RIAO Conference Pub Date : 2007-05-30 DOI:10.5555/1931390.1931450

Sharon Givon, Maria Milosavljevic

引用次数: 0

Abstract

In this paper, we describe some experiments in large-scale Information Extraction (IE) focusing on book texts. We investigate the scalability of IE techniques to full-sized books, and the utility of IE techniques in extracting useful information from fiction. In particular, we evaluate a variety of Named Entity Recognition (NER) techniques in identifying the central characters in works of fiction. First, we describe the creation of a gold standard for evaluation, which contains ordered lists of characters for a corpus of classic book texts in Project Gutenberg. Second, we describe several approaches to the task of character identification, where our best model achieves an average coverage score of 78.4% across all central characters. Finally, we propose a number of approaches for future work.

查看原文本刊更多论文

从小说全文中提取有用信息

本文介绍了以图书文本为中心的大规模信息抽取(IE)实验。我们研究了IE技术在全尺寸书籍中的可扩展性，以及IE技术在从小说中提取有用信息方面的效用。特别地，我们评估了各种命名实体识别(NER)技术在识别小说作品中的中心人物。首先，我们描述了评估金标准的创建，该标准包含古登堡计划中经典书籍文本语料库的有序字符列表。其次，我们描述了几种字符识别任务的方法，其中我们最好的模型在所有中心字符上实现了78.4%的平均覆盖率得分。最后，我们提出了未来工作的一些方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

RIAO Conference

自引率

0.00%

发文量