GMU-WLV at TSAR-2022 Shared Task: Evaluating Lexical Simplification Models

Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022) Pub Date : 1900-01-01 DOI:10.18653/v1/2022.tsar-1.30

Kai North, Alphaeus Dmonte, Tharindu Ranasinghe, Marcos Zampieri

引用次数: 7

Abstract

This paper describes team GMU-WLV submission to the TSAR shared-task on multilingual lexical simplification. The goal of the task is to automatically provide a set of candidate substitutions for complex words in context. The organizers provided participants with ALEXSIS a manually annotated dataset with instances split between a small trial set with a dozen instances in each of the three languages of the competition (English, Portuguese, Spanish) and a test set with over 300 instances in the three aforementioned languages. To cope with the lack of training data, participants had to either use alternative data sources or pre-trained language models. We experimented with monolingual models: BERTimbau, ELECTRA, and RoBERTA-largeBNE. Our best system achieved 1st place out of sixteen systems for Portuguese, 8th out of thirty-three systems for English, and 6th out of twelve systems for Spanish.

查看原文本刊更多论文

TSAR-2022共享任务:评价词汇简化模型

本文描述了GMU-WLV团队提交给TSAR的多语言词汇简化共享任务。该任务的目标是自动为上下文中的复杂单词提供一组候选替换。组织者为参与者提供了ALEXSIS一个手动注释的数据集，其中的实例分为两个部分:一个小的试验集，其中每一种都有十几个实例，使用竞赛的三种语言(英语、葡萄牙语、西班牙语);一个测试集，其中有超过300个实例，使用上述三种语言。为了解决缺乏训练数据的问题，参与者必须使用替代数据源或预先训练的语言模型。我们用单语模型进行了实验:BERTimbau、ELECTRA和RoBERTA-largeBNE。我们最好的系统在16个葡萄牙语系统中获得了第一名，在33个英语系统中获得了第八名，在12个西班牙语系统中获得了第六名。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022)

自引率

0.00%

发文量