Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings in the Spanish Press

Elena Álvarez Mellado, Luis Espinosa Anke, Julio Gonzalo Arroyo, Constantine Lignos, Jordi Porta-Zamorano
{"title":"Overview of ADoBo 2021: Automatic Detection of Unassimilated Borrowings in the Spanish Press","authors":"Elena Álvarez Mellado, Luis Espinosa Anke, Julio Gonzalo Arroyo, Constantine Lignos, Jordi Porta-Zamorano","doi":"10.26342/2021-67-24","DOIUrl":null,"url":null,"abstract":"espanolEn este articulo presentamos los resultados de ADoBo 2021, la tarea compartida de IberLEF 2021 sobre deteccion de prestamos lexicos en la prensa espanola. En esta tarea abordamos la deteccion de prestamos como un problema de etiquetado de secuencias. A los participantes de la tarea se les proporciono un corpus de prensa espanola anotado con prestamos lexicos no asimilados (mayoritariamente anglicismos) siguiendo el esquema BIO. Recibimos nueve sistemas distintos provenientes de cuatro equipos diferentes. Los resultados obtenidos oscilan entre los 37 y los 85 puntos de valor F1, lo que indica que la deteccion de prestamos lexicos es un problema no resuelto (sobre todo cuando se abordan prestamos no vistos anteriormente) y que el trabajo lexicografico tradicional podria beneficiarse de incorporar las tecnicas actuales del PLN. EnglishThis paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021. In this task, we invited participants to detect lexical borrowings (coming mostly from English) in Spanish newswire texts. This task was framed as a sequence classification problem using BIO encoding. We provided participants with an annotated corpus of lexical borrowings which we split into training, development and test splits. We received submissions from 4 teams with 9 different system runs overall. The results, which range from F1 scores of 37 to 85, suggest that this is a challenging task, especially when out-of-domain or OOV words are considered, and that traditional methods informed with lexicographic information would benefit from taking advantage of current NLP trends.","PeriodicalId":258781,"journal":{"name":"Proces. del Leng. Natural","volume":"64 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proces. del Leng. Natural","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26342/2021-67-24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

espanolEn este articulo presentamos los resultados de ADoBo 2021, la tarea compartida de IberLEF 2021 sobre deteccion de prestamos lexicos en la prensa espanola. En esta tarea abordamos la deteccion de prestamos como un problema de etiquetado de secuencias. A los participantes de la tarea se les proporciono un corpus de prensa espanola anotado con prestamos lexicos no asimilados (mayoritariamente anglicismos) siguiendo el esquema BIO. Recibimos nueve sistemas distintos provenientes de cuatro equipos diferentes. Los resultados obtenidos oscilan entre los 37 y los 85 puntos de valor F1, lo que indica que la deteccion de prestamos lexicos es un problema no resuelto (sobre todo cuando se abordan prestamos no vistos anteriormente) y que el trabajo lexicografico tradicional podria beneficiarse de incorporar las tecnicas actuales del PLN. EnglishThis paper summarizes the main findings of the ADoBo 2021 shared task, proposed in the context of IberLef 2021. In this task, we invited participants to detect lexical borrowings (coming mostly from English) in Spanish newswire texts. This task was framed as a sequence classification problem using BIO encoding. We provided participants with an annotated corpus of lexical borrowings which we split into training, development and test splits. We received submissions from 4 teams with 9 different system runs overall. The results, which range from F1 scores of 37 to 85, suggest that this is a challenging task, especially when out-of-domain or OOV words are considered, and that traditional methods informed with lexicographic information would benefit from taking advantage of current NLP trends.
ADoBo 2021概述:西班牙语出版社中未同化借用的自动检测
在这篇文章中,我们展示了ADoBo 2021的结果,这是IberLEF 2021关于检测西班牙媒体词汇借出的共享任务。在这种情况下,我们发现了一个序列标记问题。该任务的参与者被提供了一个西班牙语新闻语料库,并根据BIO计划注释了未同化的词汇借出(主要是英语)。我们收到了来自四个不同团队的九个不同系统。成果价值37亿至85分F1,这表明我们lexicos成为是一个尚未解决的问题(尤其是当涉及到我们以前没有看到过)和传统工作lexicografico可能得益于tecnicas纳入现有兹罗提。本文总结了在IberLef 2021的背景下提出的ADoBo 2021共享任务的主要发现。在这项任务中,我们邀请参与者检测西班牙语新闻专线文本中的词汇缺失(主要来自英语)。was framed as a序列分类problem using This task BIO encoding。我们为参与者提供了一个注释的词汇借用语料库,我们将其分为训练、发展和测试。我们收到提交不同system runs from 4小组with 9全面。F1分数从37分到85分不等的结果表明,这是一项具有挑战性的任务,特别是考虑到域外词汇或OOV词汇时,而提供词典编纂信息的传统方法将受益于利用当前的NLP趋势。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信