Sentence Similarity Recognition in Portuguese from Multiple Embedding Models

Ana Carolina Rodrigues, R. Marcacini
{"title":"Sentence Similarity Recognition in Portuguese from Multiple Embedding Models","authors":"Ana Carolina Rodrigues, R. Marcacini","doi":"10.1109/ICMLA55696.2022.00029","DOIUrl":null,"url":null,"abstract":"Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Distinct pre-trained embedding models perform differently in sentence similarity recognition tasks. The current assumption is that they encode different features due to differences in algorithm design and characteristics of the datasets employed in the pre-trained process. The perspective of benefiting from different encoded features to generate more suitable representations motivated the assembly of multiple embedding models, so-called meta-embedding. Meta-embedding methods combine different pre-trained embedding models to perform a task. Recently, multiple pre-trained language representations derived from Transformers architecture-based systems have been shown to be effective in many downstream tasks. This paper introduces a supervised meta-embedding neural network to combine contextualized pre-trained models for sentence similarity recognition in Portuguese. Our results show that combining multiple sentence pre-trained embedding models outperforms single models and can be a promising alternative to improve performance sentence similarity. Moreover, we also discuss the results considering our simple extension of a model explainability method to the meta-embedding context, allowing the visual identification of the impact of each token on the sentence similarity score.
基于多嵌入模型的葡萄牙语句子相似度识别
不同的预训练嵌入模型在句子相似度识别任务中的表现不同。目前的假设是,由于算法设计和预训练过程中使用的数据集的特征不同,它们编码了不同的特征。从利用不同的编码特征来生成更合适的表示的角度出发,推动了多个嵌入模型的组装,即元嵌入。元嵌入方法结合不同的预训练嵌入模型来执行任务。最近,从基于transformer体系结构的系统中派生的多个预训练语言表示已被证明在许多下游任务中是有效的。本文介绍了一种监督元嵌入神经网络,结合情境化预训练模型进行葡萄牙语句子相似度识别。我们的研究结果表明,组合多个句子预训练的嵌入模型优于单个模型,可以成为提高句子相似度的一种有希望的替代方法。此外,我们还讨论了我们将模型可解释性方法简单扩展到元嵌入上下文的结果,允许视觉识别每个标记对句子相似度得分的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信