Quotation Detection and Classification with a Corpus-Agnostic Model

Recent Advances in Natural Language Processing Pub Date : 2019-10-22 DOI:10.26615/978-954-452-056-4_103

Sean Papay, Sebastian Padó

引用次数: 10

Abstract

The detection of quotations (i.e., reported speech, thought, and writing) has established itself as an NLP analysis task. However, state-of-the-art models have been developed on the basis of specific corpora and incorpo- rate a high degree of corpus-specific assumptions and knowledge, which leads to fragmentation. In the spirit of task-agnostic modeling, we present a corpus-agnostic neural model for quotation detection and evaluate it on three corpora that vary in language, text genre, and structural assumptions. The model (a) approaches the state-of-the-art on the corpora when using established feature sets and (b) shows reasonable performance even when us- ing solely word forms, which makes it applicable for non-standard (i.e., historical) corpora.

查看原文本刊更多论文

基于语料库不可知模型的引文检测与分类

对引语(即转述的言语、思想和写作)的检测已经成为一项NLP分析任务。然而，最先进的模型是在特定语料库的基础上发展起来的，并且包含了高度的语料库特定假设和知识，这导致了碎片化。在任务不可知论建模的精神下，我们提出了一个语料库不可知论的引文检测神经模型，并在三个不同语言、文本类型和结构假设的语料库上对其进行了评估。当使用已建立的特征集时，该模型(a)在语料库上接近最先进的水平;(b)即使只使用词的形式，也显示出合理的性能，这使得它适用于非标准(即历史)语料库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Recent Advances in Natural Language Processing

自引率

0.00%

发文量