Exploring Coreference Features in Heterogeneous Data with Text Classification

Ekaterina Lapshinova-Koltunski, K. Kunz
{"title":"Exploring Coreference Features in Heterogeneous Data with Text Classification","authors":"Ekaterina Lapshinova-Koltunski, K. Kunz","doi":"10.18653/v1/2020.codi-1.6","DOIUrl":null,"url":null,"abstract":"The present paper focuses on variation phenomena in coreference chains. We address the hypothesis that the degree of structural variation between chain elements depends on language-specific constraints and preferences and, even more, on the communicative situation of language production. We define coreference features that also include reference to abstract entities and events. These features are inspired through several sources – cognitive parameters, pragmatic factors and typological status. We pay attention to the distributions of these features in a dataset containing English and German texts of spoken and written discourse mode, which can be classified into seven different registers. We apply text classification and feature selection to find out how these variational dimensions (language, mode and register) impact on coreference features. Knowledge on the variation under analysis is valuable for contrastive linguistics, translation studies and multilingual natural language processing (NLP), e.g. machine translation or cross-lingual coreference resolution.","PeriodicalId":332037,"journal":{"name":"Proceedings of the First Workshop on Computational Approaches to Discourse","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First Workshop on Computational Approaches to Discourse","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/2020.codi-1.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The present paper focuses on variation phenomena in coreference chains. We address the hypothesis that the degree of structural variation between chain elements depends on language-specific constraints and preferences and, even more, on the communicative situation of language production. We define coreference features that also include reference to abstract entities and events. These features are inspired through several sources – cognitive parameters, pragmatic factors and typological status. We pay attention to the distributions of these features in a dataset containing English and German texts of spoken and written discourse mode, which can be classified into seven different registers. We apply text classification and feature selection to find out how these variational dimensions (language, mode and register) impact on coreference features. Knowledge on the variation under analysis is valuable for contrastive linguistics, translation studies and multilingual natural language processing (NLP), e.g. machine translation or cross-lingual coreference resolution.
利用文本分类探索异构数据中的共参考特征
本文主要研究共参考链中的变异现象。我们提出了一个假设,即链元素之间的结构变化程度取决于语言特定的约束和偏好,甚至更多地取决于语言生产的交际情况。我们定义了包括对抽象实体和事件的引用在内的共引用特性。这些特征的灵感来源于认知参数、语用因素和类型状态。我们在包含英语和德语口语和书面话语模式文本的数据集中关注这些特征的分布,这些文本可以被分类为七个不同的语域。我们运用文本分类和特征选择来研究这些变化维度(语言、模式和语域)对共指特征的影响。分析变异的知识对于对比语言学、翻译研究和多语言自然语言处理(NLP),例如机器翻译或跨语言共指解析是有价值的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信