Multi-lingual and Cross-genre Discourse Unit Segmentation

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019 Pub Date : 1900-01-01 DOI:10.18653/V1/W19-2714

Peter Bourgonje, Robin Schäfer

引用次数: 7

Abstract

We describe a series of experiments applied to data sets from different languages and genres annotated for coherence relations according to different theoretical frameworks. Specifically, we investigate the feasibility of a unified (theory-neutral) approach toward discourse segmentation; a process which divides a text into minimal discourse units that are involved in s coherence relation. We apply a RandomForest and an LSTM based approach for all data sets, and we improve over a simple baseline assuming simple sentence or clause-like segmentation. Performance however varies a lot depending on language, and more importantly genre, with f-scores ranging from 73.00 to 94.47.

查看原文本刊更多论文

多语言跨体裁语篇单元分割

我们描述了一系列应用于不同语言和体裁数据集的实验，这些数据集根据不同的理论框架注释了连贯关系。具体来说，我们研究了统一(理论中立)的话语分割方法的可行性;将语篇划分为最小语篇单位的过程，这些语篇单位涉及到连贯关系。我们对所有数据集应用随机森林和基于LSTM的方法，并在假设简单句子或类子句分割的简单基线上进行改进。然而，表现因语言而异，更重要的是类型，f分在73.00到94.47之间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019

自引率

0.00%

发文量