Exploring Coreference Features in Heterogeneous Data with Text Classification

Proceedings of the First Workshop on Computational Approaches to Discourse Pub Date : 2020-11-01 DOI:10.18653/v1/2020.codi-1.6

Ekaterina Lapshinova-Koltunski, K. Kunz

{"title":"Exploring Coreference Features in Heterogeneous Data with Text Classification","authors":"Ekaterina Lapshinova-Koltunski, K. Kunz","doi":"10.18653/v1/2020.codi-1.6","DOIUrl":null,"url":null,"abstract":"The present paper focuses on variation phenomena in coreference chains. We address the hypothesis that the degree of structural variation between chain elements depends on language-specific constraints and preferences and, even more, on the communicative situation of language production. We define coreference features that also include reference to abstract entities and events. These features are inspired through several sources – cognitive parameters, pragmatic factors and typological status. We pay attention to the distributions of these features in a dataset containing English and German texts of spoken and written discourse mode, which can be classified into seven different registers. We apply text classification and feature selection to find out how these variational dimensions (language, mode and register) impact on coreference features. Knowledge on the variation under analysis is valuable for contrastive linguistics, translation studies and multilingual natural language processing (NLP), e.g. machine translation or cross-lingual coreference resolution.","PeriodicalId":332037,"journal":{"name":"Proceedings of the First Workshop on Computational Approaches to Discourse","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First Workshop on Computational Approaches to Discourse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.codi-1.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The present paper focuses on variation phenomena in coreference chains. We address the hypothesis that the degree of structural variation between chain elements depends on language-specific constraints and preferences and, even more, on the communicative situation of language production. We define coreference features that also include reference to abstract entities and events. These features are inspired through several sources – cognitive parameters, pragmatic factors and typological status. We pay attention to the distributions of these features in a dataset containing English and German texts of spoken and written discourse mode, which can be classified into seven different registers. We apply text classification and feature selection to find out how these variational dimensions (language, mode and register) impact on coreference features. Knowledge on the variation under analysis is valuable for contrastive linguistics, translation studies and multilingual natural language processing (NLP), e.g. machine translation or cross-lingual coreference resolution.

查看原文本刊更多论文

利用文本分类探索异构数据中的共参考特征

本文主要研究共参考链中的变异现象。我们提出了一个假设，即链元素之间的结构变化程度取决于语言特定的约束和偏好，甚至更多地取决于语言生产的交际情况。我们定义了包括对抽象实体和事件的引用在内的共引用特性。这些特征的灵感来源于认知参数、语用因素和类型状态。我们在包含英语和德语口语和书面话语模式文本的数据集中关注这些特征的分布，这些文本可以被分类为七个不同的语域。我们运用文本分类和特征选择来研究这些变化维度(语言、模式和语域)对共指特征的影响。分析变异的知识对于对比语言学、翻译研究和多语言自然语言处理(NLP)，例如机器翻译或跨语言共指解析是有价值的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the First Workshop on Computational Approaches to Discourse

自引率

0.00%

发文量