University of Padova @ DIACR-Ita

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020 Pub Date : 1900-01-01 DOI:10.4000/BOOKS.AACCADEMIA.7618

Benyou Wang, Emanuele Di Buccio, M. Melucci

引用次数: 2

Abstract

Semantic change detection task in a relatively low-resource language like Italian is challenging. By using contextualized word embeddings, we formalize the task as a distance metric for two flexible-size sets of vectors. Various distance metrics like average Euclidean Distance, average Canberra distance, Hausdorff distance, as well as Jensen–Shannon divergence between cluster distributions based on K-means clustering and Gaussian mixture model are used. The final prediction is given by an ensemble of top-ranked words based on each distance metric. The proposed method achieved better performance than a frequency and collocation based baselines.

查看原文本刊更多论文

在像意大利语这样资源相对较少的语言中，语义变化检测任务是具有挑战性的。通过使用上下文化词嵌入，我们将任务形式化为两个灵活大小的向量集的距离度量。利用基于K-means聚类和高斯混合模型的聚类分布之间的平均欧几里得距离、平均堪培拉距离、Hausdorff距离以及Jensen-Shannon散度等距离度量。最后的预测由基于每个距离度量的排名靠前的单词集合给出。该方法比基于频率和配置的基线具有更好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

自引率

0.00%

发文量