Multilingual segmentation based on neural networks and pre-trained word embeddings

Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza
{"title":"Multilingual segmentation based on neural networks and pre-trained word embeddings","authors":"Mikel Iruskieta, K. Bengoetxea, Aitziber Atutxa Salazar, A. D. Ilarraza","doi":"10.18653/v1/W19-2716","DOIUrl":null,"url":null,"abstract":"The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.","PeriodicalId":243254,"journal":{"name":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Workshop on Discourse Relation Parsing and Treebanking 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/W19-2716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

The DISPRT 2019 workshop has organized a shared task aiming to identify cross-formalism and multilingual discourse segments. Elementary Discourse Units (EDUs) are quite similar across different theories. Segmentation is the very first stage on the way of rhetorical annotation. Still, each annotation project adopted several decisions with consequences not only on the annotation of the relational discourse structure but also at the segmentation stage. In this shared task, we have employed pre-trained word embeddings, neural networks (BiLSTM+CRF) to perform the segmentation. We report F1 results for 6 languages: Basque (0.853), English (0.919), French (0.907), German (0.913), Portuguese (0.926) and Spanish (0.868 and 0.769). Finally, we also pursued an error analysis based on clause typology for Basque and Spanish, in order to understand the performance of the segmenter.
基于神经网络和预训练词嵌入的多语言分词
DISPRT 2019研讨会组织了一项共同任务,旨在确定跨形式主义和多语言话语段。在不同的理论中,基本话语单位(edu)是非常相似的。分词是修辞注释的第一阶段。尽管如此,每个注释项目都采用了几个决策,这些决策不仅对关系话语结构的注释产生了影响,而且对分割阶段也产生了影响。在这个共享任务中,我们使用了预训练的词嵌入,神经网络(BiLSTM+CRF)来执行分割。我们报告了6种语言的F1结果:巴斯克语(0.853)、英语(0.919)、法语(0.907)、德语(0.913)、葡萄牙语(0.926)和西班牙语(0.868和0.769)。最后,我们还进行了基于巴斯克语和西班牙语从句类型的错误分析,以了解分词器的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信